• Skip to primary navigation
  • Skip to content
  • Skip to footer
Jason Zhao's Blog Jason Zhao's Blog
  • Categories
  • Tags
  • About

    Table of Contents

    • Training Objectives
    • List of Resources:

    Notes on Reinforcement Learning

    less than 1 minute read

    This post serves as my reading notes, it may objectively, but has no intents to, serve as a educational post.

    Training Objectives

    List of Resources:

    1. Umar Jamil’s YouTube video on DPO
    2. Hugging Face DPO Post

    Tags: Reading Notes

    Categories: Notes

    Updated: April 12, 2026

    Share on

    X Facebook LinkedIn Bluesky
    Previous Next

    You May Also Enjoy

    Notes on Reward Model Training

    less than 1 minute read

    As I’m currently working on a project related to RL with video generation models, my dear boss asked me to study reward model training practices, and transla...

    01/19/2026 Reading Notes

    less than 1 minute read

    I read some papers today. They are good.

    2026 New Year Resolution

    less than 1 minute read

    As 2025 have been the best year in my life, I’ve decided to make 2026 even better! I decide to start this site as my starter pack for the year of 2026. It wi...

    • GitHub
    • LinkedIn
    • Feed
    © 2026 Jason Zhao's Blog. Powered by Jekyll & Minimal Mistakes.