(#equal contribution, *corresponding author)
1. Jiazhi Jiang, Jiangsu Du, Dan Huang, Zhiguang Chen, Yutong Lu, Xiangke Liao, "Full Stack Optimizing Transformer Inference on ARM Many-core CPU", TPDS, 2023. (CCF-A)
2. Jiazhi Jiang, Zijian Huang, Dan Huang, Jiangsu Du, Lin Chen, Zhiguang Chen, Yutong Lu, "Hierarchical Model Parallelism for Optimizing Inference on Many-core Processor Via Decoupled 3D-CNN Structure", TACO, 2023. (CCF-A)
3. Jiazhi Jiang, Dan Huang, Hu Chen, Yutong Lu, Xiangke Liao, "HTDcr: A Job Execution Framework for High Throughput Computing on Supercomputers", Science China Information Science, 2023. (CCF-A)
4. Rui Tian#, Jiazhi Jiang#, Jiangsu Du, Dan Huang, Yutong Lu "Sophisticated Orchestrating Concurrent DLRM Training on CPU/GPU Platform", TPDS. (CCF-A)
5. Jiazhi Jiang, Jiangsu Du, Dan Huang, Dongsheng Li, Jiang Zheng, Yutong Lu,
"Characterizing and Optimizing Transformer Inference on ARM Many-core Processor", ICPP,2022. (CCF-B)
6. Jiazhi Jiang, Rui Tian, Jiangsu Du, Dan Huang, Yutong Lu "MixRec: Orchestrating Concurrent Recommendation Model Training on CPU-GPU platform", ICCD 2023.(CCF-B)
7. Jiazhi Jiang, Dan Huang, Jiangsu Du, Yutong Lu, Xiangke Liao, "Optimizing small channel 3D convolution on GPU with tensor core", Parallel Computing, 2022. (CCF-B)
8. Jiazhi Jiang, Zijian Huang, Dan Huang, Jiangsu Du, Yutong Lu, "Accelerating Inference of 3D-CNN on ARM Many-core CPU via Hierarchical Model Partition", DATE, 2023. (CCF-B)
9. Jiazhi Jiang, Hongbin Zhang, Jiangsu Du, Jinhui Wei, Dan Huang, Yutong Lu "RTAI:Efficient Coupling Hybrid Workflow of Streaming AI and Ensemble Simulations on HPC Clusters", Euro-Par 2024. (CCF-B)
10. Jiang Zheng#, Jiazhi Jiang#, Jiangsu Du, Dan Huang, Yutong Lu "Optimizing Massively Parallel Sparse Matrix Computing on ARM Many-core Processor", Parallel Computing,2023. (CCF-B)
11. Jiangsu Du, Jiazhi Jiang, Jiang Zheng, Hongbin Zhang, Dan Huang, Yutong Lu "Improving Computation and Memory Efficiency for Real world Transformer Inference on GPUs”, TACO 2023. (CCF-A)
12. Jiangsu Du, Jiazhi Jiang, Yang You, Dan Huang, Yutong Lu, "Handling Heavy-tailed Input of Transformer Inference on GPUs", ICS, 2022. (CCF-B)
13. JiangsuDu, Jinhui Wei, Jiazhi Jiang, Shengan Cheng, Dan Huang, Yutong Lu "Liger Interleaving Intra- and Inter-Operator Parallelism for Distributed Large Model Inference", PPoPP 2024. (CCF-A)
14. Yuanxin Wei, Jiangsu Du, Jiazhi Jiang, Xiao Shi, Xianwei Zhang, Dan Huang, Nong Xiao, Yutong Lu, "APTMoE: Affinity-aware Pipeline Tuning for MoE Models on Bandwidth-constrained GPU Nodes", SC, 2024. (CCF-A)
15. Yuanxin Wei, Shengyuan Ye, Jiazhi Jiang, Xu Chen, Dan Huang, Jiangsu Du and Yutong Lu "Communication-Efficient Model Parallelism for Distributed In-situ Transformer Inference", DATE 2024. (CCF-B)
16. Jiangsu Du, Dongsheng Li, Yingpeng Wen, Jiazhi Jiang, Dan Huang, Xiangke Liao, and Yutong Lu, "SAIH: A Scalable Evaluation Methodology for Understanding AI Performance Trend on HPC Systems", JCST, 2023. (CCF-B)
17. Wenlong Zhu, Jiazhi Jiang, Dan Huang, Nong Xiao "ParM: Heterogeneous programming model based on domestic processor”, HPC China, 2023.