About Me

Wang Jie (王杰)

I completed my undergraduate studies in the Department of Computer Science and Technology at Tongji University and am currently working as a research assistant at the MAPLE Lab at Westlake University. My research interests include multimodal large models, safety supervision mechanisms and simulations for autonomous driving, as well as virtual reality. Currently, I am primarily focused on exploring the applications of large models in fine-grained emotional computation and multimodal recommendation systems.

During my undergraduate studies, I led a team to establish two startups aimed at developing mental health applications based on extended reality technologies and large language models. Our team has successfully developed two applications, which are actively being improved and tested. The project has received support from two startup funds and has won over ten awards in various innovation and entrepreneurship competitions both domestically and internationally.

If you are interested in my research directions or startup projects, or are looking for any form of collaboration, please feel free to email me at 2054310@tongji.edu.cn.

Educations

Tongji University

Bachalor Degree(2020 - 2024)

SRIAS / Tongji

Research Assistant(2022 - 2023)

Westlake University

Research Assistant(2024 - )

......

Publicatons

SentiXRL: An advanced large language Model Framework for Multilingual Fine-Grained Emotion Classification in Complex Text Environment (COLING 2025 under review)
Jie Wang, Yichen Wang, Zhilin Zhang, Kaidi Wang, Zhiyang Chen*
With strong expressive capabilities in Large Language Models(LLMs), generative models effectively capture sentiment structures and deep semantics, however, challenges remain in fine-grained sentiment classification across multi-lingual and complex contexts. To address this, we propose the Sentiment Cross-Lingual Recognition and Logic Framework (SentiXRL), which incorporates two modules, an emotion retrieval enhancement module to improve sentiment classification accuracy in complex contexts through historical dialogue and logical reasoning,and a self-circulating analysis negotiation mechanism (SANM)to facilitates autonomous decision-making within a single model for classification tasks.
VTD: Visual and Tactile Database for Driver State and Behavior Perception (IEEE RAL Minor Revision)
Jie Wang, Mobing Cai, Zhongpan Zhu*, Member, IEEE, Hongjun Ding, Jiwei Yi and Aimin Du
In the domain of autonomous vehicles, the human vehicle co-pilot system has garnered significant research attention. To address the subjective uncertainties in driver state and interaction behaviors, which are pivotal to the safety of Human-in-the-loop co-driving systems, we introduce a novel visual-tactile perception method. Utilizing a driving simulation platform, a comprehensive dataset has been developed that encompasses multi-modal data under fatigue and distraction conditions. The experimental setup integrates driving simulation with signal acquisition, yielding 600 minutes of fatigue detection data from 15 subjects and 102 takeover experiments with 17 drivers. The dataset, synchronized across modalities, serves as a robust resource for advancing cross-modal driver behavior perception algorithms. 
Perturbation Ontology based Graph Attention Networks (ICASSP 2025 under review)
Yichen Wang, Jie Wang, Fulin Wang, Xiang Li, Hao Yin, Bhiksha Raj*
In recent years, graph representation learning has undergone a paradigm shift, driven by the emergence and proliferation of graph neural networks (GNNs) and their heterogeneous counterparts. Heterogeneous GNNs have shown remarkable success in extracting low-dimensional embeddings from complex graphs that encompass diverse entity types and relationships. While meta-path-based techniques have long been recognized for their ability to capture semantic affinities among nodes, their dependence on manual specification poses a significant limitation. In this paper, we challenge the current paradigm by introducing ontology as a fundamental semantic primitive within complex graphs. Our goal is to integrate the strengths of both matrixcentric and meta-path-based approaches into a unified framework. We propose perturbation Ontology-based Graph Attention Networks (POGAT), a novel methodology that combines ontology subgraphs with an advanced self-supervised learning paradigm to achieve a deep contextual understanding. Through extensive empirical evaluations, we demonstrate that POGAT significantly outperforms state-of-the-art baselines, achieving a groundbreaking improvement of up to 14.46% in F1-score for the critical task of link prediction and 15.76% in Micro-F1 for the critical task of node classification.
Perturbation Ontology based Graph Attention Networks (AAAI Student Program 2025 under review)
Yichen Wang, Jie Wang, Fulin Wang, Xiang Li, Hao Yin, Bhiksha Raj*
In recent years, graph representation learning has undergone a paradigm shift, driven by the emergence and proliferation of graph neural networks (GNNs) and their heterogeneous counterparts. Heterogeneous GNNs have shown remarkable success in extracting low-dimensional embeddings from complex graphs that encompass diverse entity types and relationships. While meta-path-based techniques have long been recognized for their ability to capture semantic affinities among nodes, their dependence on manual specification poses a significant limitation. In this paper, we challenge the current paradigm by introducing ontology as a fundamental semantic primitive within complex graphs. Our goal is to integrate the strengths of both matrixcentric and meta-path-based approaches into a unified framework. We propose perturbation Ontology-based Graph Attention Networks (POGAT), a novel methodology that combines ontology subgraphs with an advanced self-supervised learning paradigm to achieve a deep contextual understanding. Through extensive empirical evaluations, we demonstrate that POGAT significantly outperforms state-of-the-art baselines, achieving a groundbreaking improvement of up to 14.46% in F1-score for the critical task of link prediction and 15.76% in Micro-F1 for the critical task of node classification.
Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering (BIBM 2025 final decision)
Zhilin Zhang, Jie Wang, Ruiyang Qin, Ruiqi Zhu and Xiaoliang Gong*
Medical Visual Question Answering (MedVQA) is a crucial task that combines computer vision and natural language processing to assist in clinical decision-making by answering questions based on medical images. In this paper, we propose the Intra-modal and Cross-modal Bilinear Attention Network (ICBAN), a novel model that integrates multi-scale feature extraction, self-attention mechanisms, and bilinear attention networks to effectively fuse visual and textual features. Our approach addresses the challenges of MedVQA by balancing accuracy and computational efficiency. Experimental results demonstrate that IC-BAN can achieve superior performance compared to both traditional BAN and Transformer-based fusion methods. This work highlights the potential of advanced bilinear attention mechanisms in MedVQA. 
Enhancing Autonomous Driving Safety: A Robust Stacking Ensemble Model for Traffic Sign Detection and Recognition (Sustainability 2025)
Yichen Wang, Jie Wang, Qianjin Wang
Accurate detection and classification of traffic signs play a vital role in ensuring driver safety and supporting advancements in autonomous driving technology. This paper introduces a novel approach for traffic sign detection and recognition by integrating the Faster RCNN and YOLOX-Tiny models using a stacking ensemble technique. The innovative ensemble methodology creatively merges the strengths of both models, surpassing the limitations of individual algorithms and achieving superior performance in challenging real-world scenarios. Our results show improved accuracy and efficiency in recognizing traffic signs in various real-world scenarios, including distant, close, complex, moderate, and simple settings, achieving a 4.78% increase in mean Average Precision (mAP) compared to Faster RCNN and improving Frames Per Second (FPS) by 8.1% and mAP by 6.18% compared to YOLOX-Tiny.

Intellectual Property

A safety officer workload estimation system and estimation method for 5G remote driving, China [P]. 202211634308.2, 2022.
Iland - A psychological healing program based on virtual display technology, China. Software Copyright Registration No. 2023SR0563014, 2023.
Sandbox Online - virtual sandbox system, China. Software Copyright Registration No. 2023SR0676789, 2023.
Clients Logo
Clients Logo
Clients Logo
Clients Logo
Clients Logo
Clients Logo

Honors and Awards

2023 Global Competition on Design For Future Education                                                                                                                                                                                              Best Design Award
2023 Chinese Artifical Intelligence Application Scenario Innovation Challenge                                                                                                                                                                Merit Award
2023 China-U.S. Young Maker Competition Shanghai Division                                                                                                                                                                                              Third Prize
2023 The 16th National Student Software Innovation                                                                                                                                                                                                   Semifinalist (Top 60)
2023 TCCI AI for Brain Science Competition                                                                                                                                                                                                                                First Prize
2023 The 8th Huichuang Youth Shanghai Student Projects Competiton                                                                                                                                                                               First Prize
2022 IFLYTEK AI Development Competition                                                                                                                                                                                                                    Merit Award (4/102)
2023 The 2nd China-Africa Youth Innovation and Entrepreneurship Competition                                                                                                                                                           Merit Award

Internship & Summer School

Nokia Shanghai Bell 2022 Summer school(Nokia NILC-2022)        Best Progress Award            Jul. 2022 - Aug. 2022   [Link]
2023 Baidu AI talent development Campus                           Graduate Excellence Award            Jan. 2022 - Nov. 2022  [Link]
2024 Westlake AI Open Courses                                        Best Project Runner-up Award            Jul. 2024 - Aug. 2024   [Link]

Entrepreneurship

Shanghai Besen Technology Co.,Ltd. (BR No. 310110001229712,Certificate of Organization Code:MAC5RUKW-1)
              Co-Founder,CEO
●  Incubated enterprises in National Science and Technology Park and Venture Valley of Tongji University, backed by investment agency funds (Shanghai Tongji Technology Incubator Co.).

●  Developed a well-rounded, multi-platform application using VR/AR technology and deep learning to combine traditional psychological healing techniques and reduce the reliance on psychological consultants.

●  Established and led a startup team, expanding the team to nearly 50 people.

Project Introduction


   Considering the increasingly deteriorating mental health of the younger population and the growing risks of psychological issues, coupled with the shortage of mental health professionals and the general public's lack of awareness of mental health, there is a noticeable gap in applications addressing mental healing and stress relief.
While AR technology has advanced significantly, particularly in mobile applications, it is predominantly used in B2B settings, mainly for product promotion, with limited progress in other fields. Taking into account the current market landscape and technological considerations, we have developed a multi-platform XR interactive application focused on mental healing and stress relief, providing an innovative solution for psychological well-being.