Zeyuan Zang 臧泽元

B.Eng. Student @ BUPT • Incoming Ph.D. @ ZGCA

About Me

I am a 4^th-year undergraduate student majoring in Computer Science and Technology at the School of Future, Beijing University of Posts and Telecommunications (BUPT), currently conducting research on Large Vision-Language Models (LVLMs) at the Center of Intelligence Science and Technology (CIST), advised by Prof. Xiaojie Wang.

My current research interests include:

Boosting reasoning abilities of Large Vision-Language Models (LVLMs)
Vision-Language representation learning
Novel architectures inspired by cognitive heuristics
Improving LVLMs' reliability with high-quality data

Research & Publications

Reading Images Like Texts: Sequential Image Understanding in Vision-Language Models

Yueyan Li, Chenggong Zhao, Zeyuan Zang, Caixia Yuan, Xiaojie Wang. Accepted by ICLR 2026. arXiv:2509.19191
Paper Code OpenReview

VL-DynaRefine: A Vision-Language Dynamic Refinement Approach for Visual Reasoning

Jing Ma, Haochen Sun, Zeyuan Zang, Fangxiang Feng, Caixia Yuan, Lei Ren, Huixing Jiang, Chen Wei, Xiaojie Wang. Accepted by ACM MM 2025. DOI: 10.1145/3746027.3755296
Paper

Crop-and-Prompt: Multi-Grained Prompting for Fine-Grained Visual-Language Understanding

Zeyuan Zang, Hanzi Wang. Accepted by CAIBDA 2025. DOI: 10.1109/CAIBDA65784.2025.11182630
Paper Code

Research Grants

Beijing Municipal Natural Science Foundation Undergraduate Research Project

Participant · Project No. QY24212 · 2024.11 - Present · Expected completion: 2026.05

Topic: Research and Application of Event Causality Identification Based on Causal Graphs

Responsible for implementing and evaluating a causal graph-based causal reasoning model.

“Future Scholar” Project, School of Future, BUPT

Principal Investigator · Project No. 2024WLXZ06 · 2024.09 - 2025.08 · Rate: Excellent

Topic: Optimization of Vision-Language Models for Fine-Grained Perception Tasks

Organized and executed the project, leading method design, implementation, and evaluation.

Selected Projects

Deepreader: Agentic Document Reading Toolkit under development

Code

Deepseek-OCR implemented in vLLM backend for document parsing, with tweaks (concurrency control, Lossless PDF with bounding box, markdown with clearer images) added.
Planning to integrate with LangChain to build an agentic document reading system.

HADAR: High-Altitude Object Dropping Detection, Alarm, and Record Filing System

On display at the WSIS+20 event and AI for Good Summit 2025, Geneva, Switzerland.
Honorary Mention (top 21/148) in the IEEE 2025 ComSoc Student Competition.
Report Code1 Code2

Developed a lightweight object dropping detection algorithm achieving 6-10 FPS real-time performance and 75% accuracy on Raspberry Pi.
Designed and implemented an incident recording system for surveillance applications, composed of a responsive H5 frontend built with Vue.js and a scalable backend service built with Node.js.

Digital Dude: Multimodal Digital Human Interaction System

Project done during professional engineering training at Beijing ZX-CE Technology Co., Ltd.
Code

A multimodal digital human interaction frontend application based on React + TypeScript + WebRTC, supporting speech, gesture, facial expression recognition, and real-time conversation features.

Education

Beijing University of Posts and Telecommunications

B.Eng. Computer Science and Technology (School of Future) · 2022.09 - 2026.07 est.

GPA: 3.79/4 (Score: 90.92/100, Rank: 5/30) | IELTS: 7.5, CET-6: 615, CET-4: 669
Honors: 1^st Class Scholarship (2025), Xiaomi Scholarship (1^st class, 2024), Merit Student of Beijing (2024), Merit Student of BUPT (2024, 2025)

Northeast Yucai School

2-year Junior + 3-year Senior High School (Gifted Education Experimental Division) · 2017.09 - 2022.07

Graduated at 16.