avatar

Yinhan Zhang

Data Science / AI

Catalogue:

🧑‍💼 Personal Statement

📝 Publications

🌟 Entrepreneurial project

🏢 InternShip

🏆 Contest

🔧 Project

Volunteer Work

🌐 OpenSource Contribution

Personal Statement:

I’m yinhan, a Mphil student major in AI at HKUST, Guangzhou campus, GPA 3.813. I graduated from TYUT (211) with a bachelor’s degree in Data Science, GPA 85.34/100. I am very interested in AIGC, especially image and video generation, MLLM and I have very rich practical experience in the industrial field and good self-motivation.

🛠️ What I do:

Code: Python/C++ → 🤖 NLP / AIGC → 🎥 image/video generation

Open-source Contribution: PRs to GitHub AI Learning projects (16K+ stars🌟)

Business: Used LLMs to optimize e-commence/chatbots/hardware

💡 Why work with me?

Solid coding skill: 1000+ lines/week. (not leetcode)

Responsible for work: Timely feedback will be provided during the project’s progress.

📍 Based in Shenzhen | Let’s chat AI bugs/open-source drama

📧 Email: 1513032551@qq.com | GitHub: Yinhan-Zhang

Publications:

🌟 [ICCV 2025] MagicColor: Multi-instance Sketch Colorization (the first author, accepted by ICCV 2025)

▶ Project Link: https://yinhan-zhang.github.io/color/

🌟 Follow-Your-Creation: Empowering 4D Creation through Video Inpainting (nips2025 under review, co-author)

▶ Project Link: https://follow-your-creation.github.io/

Waiting release soon:

🌟 ScaleAdapter: An Efficient Fine-tuning Framework for Adapting Diverse Controls and Tasks to Diffusion Transformer (plan to submit AAAI2026, first-author, interned at TeleAI)

🌟 ConceptColor: Multi-concept Sketch Colorization via Diffusion Transformer (plan to submit AAAI2025, first-author)

🌟 InstanceAnimate: Instance-aware Sketch Video Colorization (plan to submit ICLR2025, first-author, interned at TeleAI)

Entrepreneurial Project

🏢 派呦编程

🌟 奇绩创坛2024秋季路演项目

Introduction: Pyoh Technology focuses on the development of innovative and practical programming education hardware products. Through these products, children can learn programming knowledge in the game, develop logical thinking, so as to enhance programming ability and creativity. Compared with traditional programming education methods, Pyoh Technology’s hardware products are highly interactive and interesting. This “edutainment” approach caters to the learning psychology of modern children. In addition, the creativity education advocated by Pai You Technology also makes it stand out in the highly competitive programming education market.

More information: https://pyoh.cn/index.html

InternShip

Nov 2024 - Present

🏢 China Telecom Artificial Intelligence Research Institute(TeleAI) | AIGC Research Group

💼 Text-to-Video Research Intern

🎉 Key Contributions:

Data-driven Video Generation Optimization:
Enhanced CogvideoX/Wan2.1 model with multi-conditional inputs (camera parameters, 2D/3D pose, single/multi-person constraints) to achieve controllable human motion video generation, improving model controllability.

Foundational Model Development: Co-developed China Telecom’s first video generation base model VAST1.0, focusing on two-stage, cross-modal text to video generation.

Data Pipeline Construction: Built end-to-end video generation data pipeline, reducing data preprocessing time by 40% through automated annotation tool integration.

July 2024 - June 2024

🏢 Tencent Hunyuan(混元) | Image Foundation Model Team

💼 AIGC Algorithm Intern

Text-to-Image Model Optimization and Multi-Round Dialogue Image Generation System(Based on Hunyuan DiT)

Objective: Enhance text-image alignment of HunyuanDiT model through data-driven optimization

🎉 Key Contributions:

  • Data-driven caption optimization, improving generation fidelity by 5% (measured via CLIP score)

  • Query Rewriting Engine: Developed prompt engineering framework based on Qwen-VL, transforming ambiguous user queries into structured prompts (style+composition+detail), achieving 92% intent accuracy on 100K dialogue samples. Built fine-tuning dataset with multi-turn dialogues.

  • Model Switching Strategy: Designed lightweight classifier to dynamically select model branches (text2img/img2img) based on query complexity.

  • Badcase Resolution: Classified failure modes, implemented context-aware correction module.

Nov 2023 - Apr 2024

🏢 JD Retail | Technology Middleware-Algorithm Department(算法中台部)

💼 Large Model Algorithm Intern

Project: E-commerce Product Matching System (Text & Multi-modal)

🎉 Technical Contributions:

▶ Model Architecture:

Developed dual-text matching models using DeBerta (e-commerce vertical fine-tuning) and InternLM-7B (few-shot learning), achieving semantic alignment for product titles/features

Innovated unsupervised contrastive sampling:

  • Constructed 120K+ pseudo-labeled pairs (positives: same-item different titles; negatives: similar-category non-matching items)
  • Designed dynamic prompt templates (e.g., “[Brand]+[Function]+[Model] - Same Item?”) to boost long-tail category generalization.

Dataset Engineering:

  • Built multi-task hybrid dataset (search logs + product specs), optimizing loss weight ratio (CE:Contrastive=7:3) to lift category accuracy from 72%→80% (11% absolute gain)
  • Implemented hard-negative mining, improving top-10 error category accuracy by 25%

▶ Multi-modal Matching (40 Core Categories)

Breakthrough: Replaced legacy “text-image sequential filtering” with Qwen-based multi-modal model

  • Our group annotated 80K+ cross-modal pairs (product images + titles + specs), defining alignment rules (e.g., “red dress” matches RGB 255-105-180)

  • Created “similar-but-not-same” dataset (e.g., mobile phones with different RAM versions) to enforce fine-grained discrimination

  • Designed “text-image-price” triple-branch fusion architecture on Qwen-VL, adding e-commerce prefixes (e.g., “[Product Match] Judge if the same:”)

  • Introduced weighted alignment loss for core attributes (brand/model/specs), improving accuracy from 82%→90% (9% absolute gain)

July 2023 - November 2023

🏢 Baidu Financial Services(百度金融) | AI Innovation Business Unit

💼 NLP Algorithm Intern

Project: Construction of Intent Recognition and Intelligent Customer Service Large Model in the Financial Domain

Background: This project aimed to build an intent recognition system and an intelligent customer-service large model for the financial domain, replacing the manual customer-service process in loan guidance.

🎉 Technical Contributions:

🤖 Intent Recognition

▶ Model Training: Trained an intent recognition model based on Roberta.
Data Optimization: Optimized the training QA pair data. Through these efforts, the accuracy of recognizing users’ conversation intents reached 0.85.

🤖 Intelligent Customer-Service Large Model

▶ Model Selection: Chose the ChatGLM2 model for the intelligent customer - service application.
Model Training and Updating: Responsible for training the SFT (Supervised Fine - Tuning) and RW (Reward Modeling) models. After training the RW model, updates were made to the SFT model. By accumulating labeled feedback data, both the SFT and RW models were continuously updated to improve performance.

April 2023 to May 2023

🏢 Daikin (China) Investment Co., Ltd.

💼 Technical Research and Development, Control Algorithm Engineer, Shenzhen ,
specializing in the technical research and development of smart home systems.

🎉 Key Contributions:

  1. Integrated various household appliances into the router terminal through HomeAssistance to real-time collect and retrieve data from all sensors.
  2. Process the collected data and develop logic control algorithms and prediction models to predict human activities and states in the environment. Automate the adjustment of environmental device parameters.

August 2022

🏢 Zhiyuan Education Co., Ltd.

💼 Engineer, Shenzhen during the summer of 2022

🎉 Key Contributions:

During my internship, I will be using Raspberry Pi to create some small teaching demos, such as a simple facial recognition access control device and a mechanical arm sorting system for simulating a factory production line.

Contest

2023/2024 China Mobile Wutong Cup 🏆 National Third Prize | Top 10 | Team Leader

2024 Alibaba Cloud Large Model Agent Challenge 🏅 Technical Application Prize | TOP 20

2022 The National College Student Internet+ Innovation and Entrepreneurship Competition 🏆 National Second Prize

2022 The China Software Cup National College Student Software Design Competition 🏆 National Second Prize

2022 Tencent Rhinoceros Bird Open Source Program 🏅 Tencent Open Source Outstanding student | The only undergraduate student selected

The 2022 BDCI Competition’s Criminal sentence reduction prediction 🏅 10th Place (10/482)

2023 The Chinese Robotics and Artificial Intelligence Competition 🏆 National Third Prize. | Team Leader

2022 The National College Student Mathematical Modeling Competition 🏆 The second prize in the Shanxi Province


Project

Rhinoceros Bird Open Source Talent Development Program

2022.07 - 2022.09 Tencent Open Source Contributor

Project description:

Adopting a university-industry dual mentorship model, based on Tencent’s existing open-source project Angel, learning graph representation learning and some GNN algorithms under the guidance of mentors, and exploring research opportunities.

Main responsibilities:

  1. Under the guidance of the industry mentor, learn Hadoop, Spark, and algorithms related to graph representation learning.
  2. Deploy Tencent’s high-performance machine learning platform Angel locally, develop the graph representation learning algorithm Struct2vec, and compare and analyze it with the classical DeepWalk and Node2vec algorithms. Achieved a 0.1 increase in average similarity within clusters for clustering and a 0.26 improvement in classification accuracy.

Volunteer Work

Hands-on Learning of Deep Learning, DataWhale Course Assistant

• The DataWhale team I belong to contacted Professor Li Mu and organized the “Hands-on Deep Learning”

course, attracted 9,027 students from 733 universities around the world to join the group for learning.

• I was one of ths, helping and solving problems for students during the learning process.

OpenSource Contribution

AI opensource Learning Project:

-> LLM Cookbook(面向开发者的大模型手册): https://github.com/datawhalechina/llm-cookbook | 16.2K stars 🌟🌟🌟
-> d2l-ai-solutions-manual(动手学深度学习习题解答): https://github.com/datawhalechina/ | 417 stars 🌟
-> Grape book(图深度学习 ): https://github.com/datawhalechina/grape-book | 225 stars 🌟

Mutual AI:

I created this project to make the boring and difficult-to-understand algorithms of AI interesting, and to teach learners how to apply AI to real-life situations.

github link :https://github.com/YinHan-Zhang/Mutual-AI