Paper Sharing

Posts

Oct 18, 2025
SQA-038
Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator [paper]
Aug 14, 2025
SQA-037
Think Clearly: Improving Reasoning via Redundant Token Pruning [paper]
Aug 12, 2025
SQA-036
Diffuse and Disperse: Image Generation with Representation Regularization [paper]
Aug 8, 2025
SQA-035
Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture [paper]
Aug 5, 2025
SQA-034
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful [paper]
Aug 2, 2025
SQA-033
Roformer: Enhanced transformer With Rotary Position Embedding [paper]
Jun 9, 2025
SQA-032
FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification [paper]
Jun 1, 2025
SQA-031
Guiding a Diffusion Model with a Bad Version of Itself [paper]
May 28, 2025
SQA-030
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models [paper]
May 16, 2025
SQA-029
Group Normalization [paper]
May 13, 2025
SQA-028
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift [paper]
May 12, 2025
SQA-027
Transformers without Normalization [paper]
May 3, 2025
SQA-026
Analyzing and Improving the Training Dynamics of Diffusion Models [paper]
Apr 28, 2025
SQA-025
GIVT: Generative Infinite-Vocabulary Transformers [paper]
Apr 23, 2025
SQA-024
Jet: A Modern Transformer-Based Normalizing Flow [paper]
Apr 22, 2025
SQA-023
JetFormer: An Autoregressive Generative Model Of Raw Images And Text [paper]
Apr 12, 2025
SQA-022
CLIP: Learning Transferable Visual Models From Natural Language Supervision [paper]
Apr 12, 2025
SQA-021
Deep Equilibrium Approaches to Diffusion Models [paper]
Apr 11, 2025
SQA-020
PixelFlow: Pixel-Space Generative Models with Flow [paper]
Mar 31, 2025
WXB-005
Mamba: Linear-Time Sequence Modeling with Selective State Spaces [paper]
Mar 28, 2025
JZC-012
Scaling Vision with Sparse Mixture of Experts [paper]
Mar 28, 2025
JZC-011
The Impact of Initialization on LoRA Finetuning Dynamics [paper]
Mar 28, 2025
JZC-010
LoRA: Low-Rank Adaptation of Large Language Models [paper]
Mar 28, 2025
JZC-009
Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models [paper]
Mar 28, 2025
JZC-008
S^4-Tuning: A Simple Cross-lingual Sub-network Tuning Method [paper]
Mar 28, 2025
JZC-007
Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning [paper]
Mar 20, 2025
SQA-019
Momentum Contrast for Unsupervised Visual Representation Learning [paper]
Mar 11, 2025
SQA-018
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis [paper]
Mar 6, 2025
SQA-017
Normalizing Flows are Capable Generative Models [paper]
Feb 22, 2025
WXB-004
Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation [paper]
Feb 19, 2025
WXB-003
Neural Ordinary Differential Equations [paper]
Feb 17, 2025
WXB-002
One Step Diffusion via Shortcut Models [paper]
Feb 15, 2025
WXB-001
The Road Less Scheduled [paper]
Feb 11, 2025
ZHH-017
MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis [paper]
Feb 11, 2025
SQA-016
A Simple Framework for Contrastive Learning of Visual Representations [paper]
Feb 10, 2025
ZHH-016
Masked Autoencoders Are Scalable Vision Learners [paper]
Feb 9, 2025
ZHH-015
Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models [paper]
Feb 5, 2025
SQA-015
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning [paper]
Feb 4, 2025
SQA-014
Emerging Properties in Self-Supervised Vision Transformers [paper]
Feb 3, 2025
SQA-013
Scalable Diffusion Models with Transformers [paper]
Dec 9, 2024
SQA-012
Consistency Model Made Easy [paper]
Dec 2, 2024
ZHH-014
Diffusion Models Beat GANs on Image Synthesis [paper]
Dec 2, 2024
SQA-011
IMPROVED TECHNIQUES FOR TRAINING CONSISTENCY MODELS [paper]
Dec 2, 2024
SQA-010
Consistency models [paper]
Nov 29, 2024
SQA-009
SCORE-BASED GENERATIVE MODELING THROUGH STOCHASTIC DIFFERENTIAL EQUATIONS [paper]
Nov 24, 2024
SQA-008
DDIM: Denoising Diffusion Implicit Models [paper]
Nov 24, 2024
SQA-007
A Connection Between Score Matching and Denoising Autoencoders [paper]
Nov 24, 2024
SQA-006
Elucidating the Design Space of Diffusion-BasedGenerative Models [paper]
Nov 8, 2024
SQA-005
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers [paper]
Oct 29, 2024
JZC-006
Denoising Diffusion Implicit Models [paper]
Oct 28, 2024
SQA-004
BUILDING NORMALIZING FLOWS WITH STOCHASTIC INTERPOLANTS [paper]
Oct 21, 2024
SQA-003
simple diffusion: End-to-end diffusion for high resolution images [paper]
Oct 21, 2024
SQA-002
PROGRESSIVE DISTILLATION FOR FAST SAMPLING OF DIFFUSION MODELS [paper]
Oct 21, 2024
SQA-001
Classifier-Free Diffusion Guidance [paper]
Oct 11, 2024
ZHH-013
Flow Matching for Generative Modeling [paper]
Oct 5, 2024
ZHH-012
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation [paper]
Oct 5, 2024
ZHH-011
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model [paper]
Oct 5, 2024
ZHH-010
Autoregressive Image Generation without Vector Quantization [paper]
Oct 5, 2024
ZHH-009
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation [paper]
Oct 5, 2024
ZHH-008
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction [paper]
Oct 5, 2024
ZHH-007
An Image is Worth 32 Tokens for Reconstrcution and Generation [paper]
Oct 5, 2024
ZHH-006
Improved Variational Inference with Inverse Autoregressive Flow [paper]
Oct 5, 2024
ZHH-005
Pixel Recurent Neural Networks [paper]
Oct 5, 2024
ZHH-004
Invertible Residual Networks [paper]
Oct 5, 2024
ZHH-003
Glow: Generative Flow with Invertible 1x1 Convolutions [paper]
Oct 5, 2024
ZHH-002
Variational Inference with Normalizing Flows [paper]
Oct 5, 2024
ZHH-001
Deep Image Prior [paper]
Oct 4, 2024
JZC-005
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise [paper]
Oct 4, 2024
JZC-004
Structural Biases for Improving Transformers on Translation into Morphologically Rich Languages [paper]
Oct 4, 2024
JZC-003
Enhancing the Transformer With Explicit Relational Encoding for Math Problem Solving [paper]
Oct 4, 2024
JZC-002
Learning to Reason with Third-Order Tensor Products [paper]
Oct 4, 2024
JZC-001
LifeGPT: Topology-Agnostic Generative Pretrained Transformer Model for Cellular Automata [paper]

Posts

SQA-038 Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator [paper]

SQA-037 Think Clearly: Improving Reasoning via Redundant Token Pruning [paper]

SQA-036 Diffuse and Disperse: Image Generation with Representation Regularization [paper]

SQA-035 Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture [paper]

SQA-034 Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful [paper]

SQA-033 Roformer: Enhanced transformer With Rotary Position Embedding [paper]

SQA-032 FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification [paper]

SQA-031 Guiding a Diffusion Model with a Bad Version of Itself [paper]

SQA-030 Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models [paper]

SQA-029 Group Normalization [paper]

SQA-028 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift [paper]

SQA-027 Transformers without Normalization [paper]

SQA-026 Analyzing and Improving the Training Dynamics of Diffusion Models [paper]

SQA-025 GIVT: Generative Infinite-Vocabulary Transformers [paper]

SQA-024 Jet: A Modern Transformer-Based Normalizing Flow [paper]

SQA-023 JetFormer: An Autoregressive Generative Model Of Raw Images And Text [paper]

SQA-022 CLIP: Learning Transferable Visual Models From Natural Language Supervision [paper]

SQA-021 Deep Equilibrium Approaches to Diffusion Models [paper]

SQA-020 PixelFlow: Pixel-Space Generative Models with Flow [paper]

WXB-005 Mamba: Linear-Time Sequence Modeling with Selective State Spaces [paper]

JZC-012 Scaling Vision with Sparse Mixture of Experts [paper]

JZC-011 The Impact of Initialization on LoRA Finetuning Dynamics [paper]

JZC-010 LoRA: Low-Rank Adaptation of Large Language Models [paper]

JZC-009 Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models [paper]

JZC-008 S^4-Tuning: A Simple Cross-lingual Sub-network Tuning Method [paper]

JZC-007 Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning [paper]

SQA-019 Momentum Contrast for Unsupervised Visual Representation Learning [paper]

SQA-018 Scaling Rectified Flow Transformers for High-Resolution Image Synthesis [paper]

SQA-017 Normalizing Flows are Capable Generative Models [paper]

WXB-004 Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation [paper]

WXB-003 Neural Ordinary Differential Equations [paper]

WXB-002 One Step Diffusion via Shortcut Models [paper]

WXB-001 The Road Less Scheduled [paper]

ZHH-017 MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis [paper]

SQA-016 A Simple Framework for Contrastive Learning of Visual Representations [paper]

ZHH-016 Masked Autoencoders Are Scalable Vision Learners [paper]

ZHH-015 Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models [paper]

SQA-015 DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning [paper]

SQA-014 Emerging Properties in Self-Supervised Vision Transformers [paper]

SQA-013 Scalable Diffusion Models with Transformers [paper]

SQA-012 Consistency Model Made Easy [paper]

ZHH-014 Diffusion Models Beat GANs on Image Synthesis [paper]

SQA-011 IMPROVED TECHNIQUES FOR TRAINING CONSISTENCY MODELS [paper]

SQA-010 Consistency models [paper]

SQA-009 SCORE-BASED GENERATIVE MODELING THROUGH STOCHASTIC DIFFERENTIAL EQUATIONS [paper]

SQA-008 DDIM: Denoising Diffusion Implicit Models [paper]

SQA-007 A Connection Between Score Matching and Denoising Autoencoders [paper]

SQA-006 Elucidating the Design Space of Diffusion-BasedGenerative Models [paper]

SQA-005 SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers [paper]

JZC-006 Denoising Diffusion Implicit Models [paper]

SQA-004 BUILDING NORMALIZING FLOWS WITH STOCHASTIC INTERPOLANTS [paper]

SQA-003 simple diffusion: End-to-end diffusion for high resolution images [paper]

SQA-002 PROGRESSIVE DISTILLATION FOR FAST SAMPLING OF DIFFUSION MODELS [paper]

SQA-001 Classifier-Free Diffusion Guidance [paper]

ZHH-013 Flow Matching for Generative Modeling [paper]

ZHH-012 Show-o: One Single Transformer to Unify Multimodal Understanding and Generation [paper]

ZHH-011 Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model [paper]

ZHH-010 Autoregressive Image Generation without Vector Quantization [paper]

ZHH-009 Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation [paper]

ZHH-008 Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction [paper]

ZHH-007 An Image is Worth 32 Tokens for Reconstrcution and Generation [paper]

ZHH-006 Improved Variational Inference with Inverse Autoregressive Flow [paper]

ZHH-005 Pixel Recurent Neural Networks [paper]

ZHH-004 Invertible Residual Networks [paper]

ZHH-003 Glow: Generative Flow with Invertible 1x1 Convolutions [paper]

ZHH-002 Variational Inference with Normalizing Flows [paper]

ZHH-001 Deep Image Prior [paper]

JZC-005 Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise [paper]

JZC-004 Structural Biases for Improving Transformers on Translation into Morphologically Rich Languages [paper]

JZC-003 Enhancing the Transformer With Explicit Relational Encoding for Math Problem Solving [paper]

JZC-002 Learning to Reason with Third-Order Tensor Products [paper]

JZC-001 LifeGPT: Topology-Agnostic Generative Pretrained Transformer Model for Cellular Automata [paper]

SQA-038
Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator [paper]

SQA-037
Think Clearly: Improving Reasoning via Redundant Token Pruning [paper]

SQA-036
Diffuse and Disperse: Image Generation with Representation Regularization [paper]

SQA-035
Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture [paper]

SQA-034
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful [paper]

SQA-033
Roformer: Enhanced transformer With Rotary Position Embedding [paper]

SQA-032
FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification [paper]

SQA-031
Guiding a Diffusion Model with a Bad Version of Itself [paper]

SQA-030
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models [paper]

SQA-029
Group Normalization [paper]

SQA-028
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift [paper]

SQA-027
Transformers without Normalization [paper]

SQA-026
Analyzing and Improving the Training Dynamics of Diffusion Models [paper]

SQA-025
GIVT: Generative Infinite-Vocabulary Transformers [paper]

SQA-024
Jet: A Modern Transformer-Based Normalizing Flow [paper]

SQA-023
JetFormer: An Autoregressive Generative Model Of Raw Images And Text [paper]

SQA-022
CLIP: Learning Transferable Visual Models From Natural Language Supervision [paper]

SQA-021
Deep Equilibrium Approaches to Diffusion Models [paper]

SQA-020
PixelFlow: Pixel-Space Generative Models with Flow [paper]

WXB-005
Mamba: Linear-Time Sequence Modeling with Selective State Spaces [paper]

JZC-012
Scaling Vision with Sparse Mixture of Experts [paper]

JZC-011
The Impact of Initialization on LoRA Finetuning Dynamics [paper]

JZC-010
LoRA: Low-Rank Adaptation of Large Language Models [paper]

JZC-009
Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models [paper]

JZC-008
S^4-Tuning: A Simple Cross-lingual Sub-network Tuning Method [paper]

JZC-007
Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning [paper]

SQA-019
Momentum Contrast for Unsupervised Visual Representation Learning [paper]

SQA-018
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis [paper]

SQA-017
Normalizing Flows are Capable Generative Models [paper]

WXB-004
Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation [paper]

WXB-003
Neural Ordinary Differential Equations [paper]

WXB-002
One Step Diffusion via Shortcut Models [paper]

WXB-001
The Road Less Scheduled [paper]

ZHH-017
MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis [paper]

SQA-016
A Simple Framework for Contrastive Learning of Visual Representations [paper]

ZHH-016
Masked Autoencoders Are Scalable Vision Learners [paper]

ZHH-015
Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models [paper]

SQA-015
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning [paper]

SQA-014
Emerging Properties in Self-Supervised Vision Transformers [paper]

SQA-013
Scalable Diffusion Models with Transformers [paper]

SQA-012
Consistency Model Made Easy [paper]

ZHH-014
Diffusion Models Beat GANs on Image Synthesis [paper]

SQA-011
IMPROVED TECHNIQUES FOR TRAINING CONSISTENCY MODELS [paper]

SQA-010
Consistency models [paper]

SQA-009
SCORE-BASED GENERATIVE MODELING THROUGH STOCHASTIC DIFFERENTIAL EQUATIONS [paper]

SQA-008
DDIM: Denoising Diffusion Implicit Models [paper]

SQA-007
A Connection Between Score Matching and Denoising Autoencoders [paper]

SQA-006
Elucidating the Design Space of Diffusion-BasedGenerative Models [paper]

SQA-005
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers [paper]

JZC-006
Denoising Diffusion Implicit Models [paper]

SQA-004
BUILDING NORMALIZING FLOWS WITH STOCHASTIC INTERPOLANTS [paper]

SQA-003
simple diffusion: End-to-end diffusion for high resolution images [paper]

SQA-002
PROGRESSIVE DISTILLATION FOR FAST SAMPLING OF DIFFUSION MODELS [paper]

SQA-001
Classifier-Free Diffusion Guidance [paper]

ZHH-013
Flow Matching for Generative Modeling [paper]

ZHH-012
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation [paper]

ZHH-011
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model [paper]

ZHH-010
Autoregressive Image Generation without Vector Quantization [paper]

ZHH-009
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation [paper]

ZHH-008
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction [paper]

ZHH-007
An Image is Worth 32 Tokens for Reconstrcution and Generation [paper]

ZHH-006
Improved Variational Inference with Inverse Autoregressive Flow [paper]

ZHH-005
Pixel Recurent Neural Networks [paper]

ZHH-004
Invertible Residual Networks [paper]

ZHH-003
Glow: Generative Flow with Invertible 1x1 Convolutions [paper]

ZHH-002
Variational Inference with Normalizing Flows [paper]

ZHH-001
Deep Image Prior [paper]

JZC-005
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise [paper]

JZC-004
Structural Biases for Improving Transformers on Translation into Morphologically Rich Languages [paper]

JZC-003
Enhancing the Transformer With Explicit Relational Encoding for Math Problem Solving [paper]

JZC-002
Learning to Reason with Third-Order Tensor Products [paper]

JZC-001
LifeGPT: Topology-Agnostic Generative Pretrained Transformer Model for Cellular Automata [paper]