1D Tokenization Timeline
An Image is Worth 32 Tokens for Reconstruction and Generation (Titok)
- Contribution
- First to propose using 1D tokenization
- previous work (e.g. CLIP, DINO…) output 1D tokens for classification but yet still capture task specific features
- First to propose using 1D tokenization
- Method
- ViT encoder
- Vector Quantizer
- ViT decoder