Zeyuan Zang 臧泽元

B.Eng. Student @ BUPT   •   Incoming Ph.D. @ ZGCA

About Me

I am a 4th-year undergraduate student majoring in Computer Science and Technology at the School of Future, Beijing University of Posts and Telecommunications (BUPT), currently conducting research on Large Vision-Language Models (LVLMs) at the Center of Intelligence Science and Technology (CIST), advised by Prof. Xiaojie Wang.

My current research interests include:

Research & Publications

Reading Images Like Texts: Sequential Image Understanding in Vision-Language Models
VL-DynaRefine: A Vision-Language Dynamic Refinement Approach for Visual Reasoning
Crop-and-Prompt: Multi-Grained Prompting for Fine-Grained Visual-Language Understanding

Research Grants

Beijing Municipal Natural Science Foundation Undergraduate Research Project

Topic: Research and Application of Event Causality Identification Based on Causal Graphs

  • Responsible for implementing and evaluating a causal graph-based causal reasoning model.
“Future Scholar” Project, School of Future, BUPT

Topic: Optimization of Vision-Language Models for Fine-Grained Perception Tasks

  • Organized and executed the project, leading method design, implementation, and evaluation.

Selected Projects

Deepreader: Agentic Document Reading Toolkit under development
  • Deepseek-OCR implemented in vLLM backend for document parsing, with tweaks (concurrency control, Lossless PDF with bounding box, markdown with clearer images) added.
  • Planning to integrate with LangChain to build an agentic document reading system.
HADAR: High-Altitude Object Dropping Detection, Alarm, and Record Filing System
  • Developed a lightweight object dropping detection algorithm achieving 6-10 FPS real-time performance and 75% accuracy on Raspberry Pi.
  • Designed and implemented an incident recording system for surveillance applications, composed of a responsive H5 frontend built with Vue.js and a scalable backend service built with Node.js.
Digital Dude: Multimodal Digital Human Interaction System

A multimodal digital human interaction frontend application based on React + TypeScript + WebRTC, supporting speech, gesture, facial expression recognition, and real-time conversation features.

Education

Beijing University of Posts and Telecommunications
  • GPA: 3.79/4 (Score: 90.92/100, Rank: 5/30) | IELTS: 7.5, CET-6: 615, CET-4: 669
  • Honors: 1st Class Scholarship (2025), Xiaomi Scholarship (1st class, 2024), Merit Student of Beijing (2024), Merit Student of BUPT (2024, 2025)

Tech Stack

Python
PyTorch
C++
React
Node.js
TypeScript
Vite
Rust
OpenCV
MongoDB
SQLite
Kali
TypeScript
Git
Linux
Latex