Catalogue：

🧑‍💼 Personal Statement

📝 Publications

🌟 Entrepreneurial project

🏢 InternShip

🏆 Contest

🔧 Project

✨ Volunteer Work

🌐 OpenSource Contribution

Personal Statement:

I’m yinhan, a Mphil student major in AI at HKUST and incoming PhD student in HKUST. I am very interested in AIGC, especially image and video generation, MLLM and I have very rich practical experience in the industrial field and good self-motivation.

🛠️ What I do:

→ Code: Python/C++ → 🤖 NLP / AIGC → 🎥 image/video generation

→ Open-source Contribution: PRs to GitHub AI Learning projects (16K+ stars🌟)

💡 Why work with me?

→ Solid coding skill: 1000+ lines/week.

→ Responsible for work: Timely feedback will be provided during the project’s progress.

📍 Based in Shenzhen | Let’s chat AI bugs/open-source drama

📧 Email: 1513032551@qq.com | GitHub: Yinhan-Zhang

Publications:

🌟 [ICCV 2025] MagicColor: Multi-instance Sketch Colorization (the first author, accepted by ICCV 2025)

▶ Project Link: https://yinhan-zhang.github.io/color/

🌟 Follow-Your-Creation: Empowering 4D Creation through Video Inpainting (ICLR2026 under review, co-author)

▶ Project Link: https://follow-your-creation.github.io/

🌟 ScaleAdapter: An Efficient Fine-tuning Framework for Adapting Diverse Controls and Tasks to Diffusion Transformer (ICLR2026 under review, first-author, interned at TeleAI)

🌟 InstanceAnimator: Instance-aware Sketch Video Colorization (ICLR2026 under review , first-author)

🌟 EVCtrl: Efficient Control Adapter for Visual Generation (ICLR2026 under review , second-author)

Entrepreneurial Project

🏢 派呦编程

🌟 奇绩创坛2024秋季路演项目

Introduction: Pyoh Technology focuses on the development of innovative and practical programming education hardware products. Through these products, children can learn programming knowledge in the game, develop logical thinking, so as to enhance programming ability and creativity. Compared with traditional programming education methods, Pyoh Technology’s hardware products are highly interactive and interesting. This “edutainment” approach caters to the learning psychology of modern children. In addition, the creativity education advocated by Pai You Technology also makes it stand out in the highly competitive programming education market.

More information: https://pyoh.cn/index.html

InternShip

Nov 2024 - Present

🏢 China Telecom Artificial Intelligence Research Institute(中国电信 TeleAI) | AIGC Research Group

💼 Video Generation Research Intern

🎉 Key Contributions:

▶ Video Generation:
Enhanced CogvideoX/Wan2.1 model with multi-conditional inputs (camera parameters, 2D/3D pose, single/multi-person constraints) to achieve controllable human motion video generation, improving model controllability.

▶ Foundational Model Development: Co-developed China Telecom’s first video generation base model VAST1.0, focusing on two-stage, cross-modal text to video generation.

▶ Data Pipeline Construction: Built end-to-end video generation data pipeline, reducing data preprocessing time by 40% through automated annotation tool integration.

▶ Model Distillation: Distill Wan2.1 condition control aimed at one setp denoising inference.

July 2024 - June 2024

🏢 Tencent Hunyuan(腾讯混元) | Image Foundation Model Team

💼 AIGC Algorithm Intern

▶ Text-to-Image Model Optimization and Multi-Round Dialogue Image Generation System(Based on Hunyuan DiT)

Objective: Enhance text-image alignment of HunyuanDiT model through data-driven optimization

🎉 Key Contributions:

Data-driven caption optimization, improving generation fidelity by 5% (measured via CLIP score)
Query Rewriting Engine: Developed prompt engineering framework based on Qwen-VL, transforming ambiguous user queries into structured prompts (style+composition+detail), achieving 92% intent accuracy on 100K dialogue samples. Built fine-tuning dataset with multi-turn dialogues.
Model Switching Strategy: Designed lightweight classifier to dynamically select model branches (text2img/img2img) based on query complexity.
Badcase Resolution: Classified failure modes, implemented context-aware correction module.

Nov 2023 - Apr 2024

🏢 JD Retail (京东零售) | Technology Middleware-Algorithm Department(算法中台部)

💼 Large Model Algorithm Intern

Project: E-commerce Product Matching System (Text & Multi-modal)

🎉 Technical Contributions:

▶ Model Architecture:

Developed dual-text matching models using DeBerta (e-commerce vertical fine-tuning) and InternLM-7B (few-shot learning), achieving semantic alignment for product titles/features

Innovated unsupervised contrastive sampling:

Constructed 120K+ pseudo-labeled pairs (positives: same-item different titles; negatives: similar-category non-matching items)
Designed dynamic prompt templates (e.g., “[Brand]+[Function]+[Model] - Same Item?”) to boost long-tail category generalization.

Dataset Engineering:

Built multi-task hybrid dataset (search logs + product specs), optimizing loss weight ratio (CE:Contrastive=7:3) to lift category accuracy from 72%→80% (11% absolute gain)
Implemented hard-negative mining, improving top-10 error category accuracy by 25%

▶ Multi-modal Matching (40 Core Categories)

Breakthrough: Replaced legacy “text-image sequential filtering” with Qwen-based multi-modal model

Our group annotated 80K+ cross-modal pairs (product images + titles + specs), defining alignment rules (e.g., “red dress” matches RGB 255-105-180)
Created “similar-but-not-same” dataset (e.g., mobile phones with different RAM versions) to enforce fine-grained discrimination
Designed “text-image-price” triple-branch fusion architecture on Qwen-VL, adding e-commerce prefixes
Introduced weighted alignment loss for core attributes (brand/model/specs), improving accuracy from 82%→90% (9% absolute gain)

July 2023 - November 2023

🏢 Baidu Financial Services(百度金融) | AI Innovation Business Unit

💼 NLP Algorithm Intern

Project: Construction of Intent Recognition and Intelligent Customer Service Large Model in the Financial Domain

Background: This project aimed to build an intent recognition system and an intelligent customer-service large model for the financial domain, replacing the manual customer-service process in loan guidance.

🎉 Technical Contributions:

🤖 Intent Recognition

▶ Model Training: Trained an intent recognition model based on Roberta.
Data Optimization: Optimized the training QA pair data. Through these efforts, the accuracy of recognizing users’ conversation intents reached 0.85.

🤖 Intelligent Customer-Service Large Model

▶ Model Selection: Chose the ChatGLM2 model for the intelligent customer - service application.
Model Training and Updating: Responsible for training the SFT (Supervised Fine - Tuning) and RW (Reward Modeling) models. After training the RW model, updates were made to the SFT model. By accumulating labeled feedback data, both the SFT and RW models were continuously updated to improve performance.

Contest

▶ 2023/2024 China Mobile Wutong Cup 🏆 National Third Prize | Top 10 | Team Leader

▶ 2024 Alibaba Cloud Large Model Agent Challenge 🏅 Technical Application Prize | TOP 20

▶ 2022 The National College Student Internet+ Innovation and Entrepreneurship Competition 🏆 National Second Prize

▶ 2022 The China Software Cup National College Student Software Design Competition 🏆 National Second Prize

▶ 2022 Tencent Rhinoceros Bird Open Source Program 🏅 Tencent Open Source Outstanding student | The only undergraduate student selected

▶ The 2022 BDCI Competition’s Criminal sentence reduction prediction 🏅 10th Place (10/482)

▶ 2023 The Chinese Robotics and Artificial Intelligence Competition 🏆 National Third Prize. | Team Leader

▶ 2022 The National College Student Mathematical Modeling Competition 🏆 The second prize in the Shanxi Province

Project

Rhinoceros Bird Open Source Talent Development Program

2022.07 - 2022.09 Tencent Open Source Contributor

Project description:

Adopting a university-industry dual mentorship model, based on Tencent’s existing open-source project Angel, learning graph representation learning and some GNN algorithms under the guidance of mentors, and exploring research opportunities.

Main responsibilities:

Under the guidance of the industry mentor, learn Hadoop, Spark, and algorithms related to graph representation learning.

Deploy Tencent’s high-performance machine learning platform Angel locally, develop the graph representation learning algorithm Struct2vec, and compare and analyze it with the classical DeepWalk and Node2vec algorithms. Achieved a 0.1 increase in average similarity within clusters for clustering and a 0.26 improvement in classification accuracy.

Imeersive Liangzhu: AR Interactive Series

Our team develop an AR intelligent application in museum that helps audiences converse with civilization.

Ancient character data of the Liangzhu Culture in Zhejiang Province. Train Stable Diffusion to complete the missing characters through image generation.
Based on ChatGPT, an AR dialogue system was developed to enhance the dialogue of the Liangzhu knowledge base and integrate AR glasses

Volunteer Work

Hands-on Learning of Deep Learning, DataWhale Course Assistant

• The DataWhale team I belong to contacted Professor Li Mu and organized the “Hands-on Deep Learning”

course, attracted 9,027 students from 733 universities around the world to join the group for learning.

• I was one of ths, helping and solving problems for students during the learning process.

OpenSource Contribution

AI opensource Learning Project:

-> LLM Cookbook(面向开发者的大模型手册): https://github.com/datawhalechina/llm-cookbook | 16.2K stars 🌟🌟🌟
-> d2l-ai-solutions-manual(动手学深度学习习题解答): https://github.com/datawhalechina/ | 417 stars 🌟
-> Grape book(图深度学习 ): https://github.com/datawhalechina/grape-book | 225 stars 🌟

Mutual AI:

I created this project to make the boring and difficult-to-understand algorithms of AI interesting, and to teach learners how to apply AI to real-life situations.

github link :https://github.com/YinHan-Zhang/Mutual-AI

Yinhan Zhang