北京大学多媒体信息处理研究室：源代码

[English Version]

关注MIPL微信公众号

招生方向

北京大学多媒体信息处理研究室：源代码

Source code

AesFormer: https://github.com/PKU-ICST-MIPL/AesFormer_ICML2026
AesFormer: Transform Everyday Photos into Beautiful Memories
TARA: https://github.com/PKU-ICST-MIPL/TARA_CVPR2026
Taxonomy-Aware Representation Alignment for Hierarchical Visual Recognition with Large Multimodal Models
Venus: https://github.com/PKU-ICST-MIPL/Venus_CVPR2026
Venus: Benchmarking and Empowering Multimodal Large Language Models for Aesthetic Guidance and Cropping
Fine-R1: https://github.com/PKU-ICST-MIPL/FineR1_ICLR2026
Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning
MRA: https://github.com/PKU-ICST-MIPL/MRA_TIP
Multi-Resolution Alignment for Voxel Sparsity in Camera-Based 3D Semantic Scene Completion.
CausalFSFG: https://github.com/PKU-ICST-MIPL/CausalFSFG_TMM
CausalFSFG: Rethinking Few-Shot Fine-Grained Visual Categorization from Causal Perspective.
Bi-C²R: https://github.com/PKU-ICST-MIPL/Bi-C2R-TPAMI2026
Bi-C²R: Bidirectional Continual Compatible Representation for Re-indexing Free Lifelong Person Re-identification(TPAMI 2026).
HD²-SSC: https://github.com/PKU-ICST-MIPL/HD2-AAAI2026
HD²-SSC: High-Dimension High-Density Semantic Scene Completion for Autonomous Driving(AAAI 2026).
CKDA: https://github.com/PKU-ICST-MIPL/CKDA-AAAI2026
CKDA: Cross-modality Knowledge Disentanglement and Alignment for Visible-Infrared Lifelong Person Re-identification(AAAI 2026).
SPHERE: https://github.com/PKU-ICST-MIPL/SPHERE_ACMMM2025
SPHERE: Semantic-PHysical Engaged REpresentation for 3D Semantic Scene Completion(ACM MM 2025).
Uni-FineParser: https://github.com/PKU-ICST-MIPL/Uni-FineParser_TPAMI2025
Human-centric Fine-grained Action Quality Assessment(TPAMI 2025).
DyFo: https://github.com/PKU-ICST-MIPL/DyFo_CVPR2025
DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding(CVPR 2025).
PosterO: https://github.com/PKU-ICST-MIPL/PosterO-CVPR2025
PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation(CVPR 2025).
DKC: https://github.com/PKU-ICST-MIPL/DKC-CVPR2025
DKC: Differentiated Knowledge Consolidation for Cloth-Hybrid Lifelong Person Re-identification(CVPR 2025).
MAI: https://github.com/PKU-ICST-MIPL/MAI_ICLR2025
MAI: A Multi-turn Aggregation-Iteration Model for Composed Image Retrieval(ICLR 2025).
Finedefics: https://github.com/PKU-ICST-MIPL/Finedefics_ICLR2025
Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models(ICLR 2025).
FineSports: https://github.com/PKU-ICST-MIPL/FineSports_CVPR2024
FineSports: A Multi-person Hierarchical Sports Video Dataset for Fine-grained Action Understanding(CVPR 2024).
SIA-OVD: https://github.com/PKU-ICST-MIPL/SIA-OVD_ACMMM2024
SIA-OVD: Shape-Invariant Adapter for Bridging the Image-Region Gap in Open-Vocabulary Detection(ACM MM 2024).
FineFMPL: https://github.com/PKU-ICST-MIPL/FineFMPL_IJCAI2024
FineFMPL: Fine-grained Feature Mining Prompt Learning for Few-Shot Class Incremental Learning(IJCAI 2024).
Firzen: https://github.com/PKU-ICST-MIPL/Firzen_ICDE2024
Firzen: Firing Strict Cold-Start Items with Frozen Heterogeneous and Homogeneous Graphs for Recommendation(ICDE 2024).
FinePOSE: https://github.com/PKU-ICST-MIPL/FinePOSE_CVPR2024
FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models(CVPR 2024).
C2R: https://github.com/PKU-ICST-MIPL/C2R_CVPR2024
Learning Continual Compatible Representation for Re-indexing Free Lifelong Person Re-identification(CVPR 2024).
DMA: https://github.com/PKU-ICST-MIPL/DMA_TIFS2023
DMA: Dual Modality-Aware Alignment for Visible-Infrared Person Re-Identification(TIFS 2024).
Real20M: https://github.com/PKU-ICST-MIPL/Real20M_ACMMM2023
Real20M: A Large-scale E-commerce Dataset for Cross-domain Retrieval(ACM MM 2023).
HCL: https://github.com/PKU-ICST-MIPL/HCL_TMM2023
HCL: Hierarchical Consistency Learning for Webly Supervised Fine-Grained Recognition(TMM 2023).
LFR-GAN: https://github.com/PKU-ICST-MIPL/LFR-GAN_TOMM2023
LFR-GAN: Local Feature Refinement based Generative Adversarial Network for Text-to-Image Generation(TOMM 2023).
PosterLayout: https://github.com/PKU-ICST-MIPL/PosterLayout-CVPR2023
PosterLayout: A New Benchmark and Approach for Content-Aware Visual-Textual Presentation Layout(CVPR 2023).
DCR-ReID: https://github.com/PKU-ICST-MIPL/DCR-ReID_TCSVT2023
DCR-ReID: Deep Component Reconstruction for Cloth-Changing Person Re-Identification(TCSVT 2023).
MKVSE: https://github.com/PKU-ICST-MIPL/MKVSE-TOMM2023
MKVSE: Multimodal Knowledge Enhanced Visual-Semantic Embedding for Image-Text Retrieval(TOMM 2023).
SIM-Trans: https://github.com/PKU-ICST-MIPL/SIM-TRANS_ACMMM2022
SIM-Trans: Structure Information Modeling Transformer for Fine-grained Visual Categorization(ACM MM 2022).
MARS: https://github.com/PKU-ICST-MIPL/MARS_TCSVT2021
MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieval(TCSVT 2021).
UVCL: https://github.com/PKU-ICST-MIPL/UVCL_TCYB2020
Unsupervised Visual-textual Correlation Learning with Fine-grained Semantic Alignment(TCYB 2020).
WSDL: https://github.com/PKU-ICST-MIPL/WSDL_TCSVT2019
Fast Fine-grained Image Classification via Weakly Supervised Discriminative Localization(TCSVT 2019).
DASG: https://github.com/PKU-ICST-MIPL/DASG_TCSVT2019
Unsupervised Cross-media Retrieval Using Domain Adaptation with Scene Graph(TCSVT 2019).
DRLIH: https://github.com/PKU-ICST-MIPL/DRLIH_TMM2020
Deep Reinforcement Learning for Image Hashing(TMM 2020).
MCSCH: https://github.com/PKU-ICST-MIPL/MCSCH_TOMM2019
Sequential Cross-Modal Hashing Learning via Multi-scale Correlation Mining(TOMM 2019).
RCBT: https://github.com/PKU-ICST-MIPL/RCBT_TCSVT2020
Reinforced Cross-Media Correlation Learning by Context-Aware Bidirectional Translation(TCSVT 2020).
OSTG: https://github.com/PKU-ICST-MIPL/OSTG_TIP2020
Video Captioning with Object-Aware Spatio-Temporal Correlation and Aggregation(TIP 2020).
OA-BTG: https://github.com/PKU-ICST-MIPL/OABTG_CVPR2019
Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning(CVPR 2019).
AGHA: https://github.com/PKU-ICST-MIPL/AGHA_MMM2019
Hierarchical Vision-Language Alignment for Video Captioning(MMM 2019).
VHSM: https://github.com/PKU-ICST-MIPL/VHSM_TCYB2020
Visual-textual Hybrid Sequence Matching for Joint Reasoning(TCYB 2020).
MAVA: https://github.com/PKU-ICST-MIPL/MAVA_TIP2020
MAVA: Multi-level Adaptive Visual-textual Alignment by Cross-media Bi-attention Mechanism(TIP 2020).
HIL: https://github.com/PKU-ICST-MIPL/HIL_TOMM2020
HIL: Recognizing Cross-media Entailment with Heterogeneous Interactive Learning(TOMM 2020).
CDCR: https://github.com/PKU-ICST-MIPL/CDCR_TCSVT2019
CDCR: Quintuple-media Joint Correlation Learning with Deep Compression and Regularization(TCSVT 2019).
DFCL: https://github.com/PKU-ICST-MIPL/DFCL_JOS2019
DFCL: Cross-media Deep Fine-grained Correlation Learning(Journal of Software 2019).
Bridge-GAN: https://github.com/PKU-ICST-MIPL/Bridge-GAN_TCSVT2019
Bridge-GAN: Interpretable Representation Learning for Text-to-image Synthesis(TCSVT 2019).
CKD: https://github.com/PKU-ICST-MIPL/CKD_TMM2019
CKD: Cross-task Knowledge Distillation for Text-to-image Synthesis(TMM 2019).
CKRM: https://github.com/PKU-ICST-MIPL/CKRM_TCSVT2020
CKRM: Multi-level Knowledge Injecting for Visual Commonsense Reasoning(TCSVT 2020).
FGCrossNet: https://github.com/PKU-ICST-MIPL/FGCrossNet_ACMMM2019
FGCrossNet: A New Benchmark and Approach for Fine-grained Cross-media Retrieval(ACM MM 2019).
DADN: https://github.com/PKU-ICST-MIPL/DADN_TCSVT2019
DADN: Zero-shot Cross-media Embedding Learning with Dual Adversarial Distribution Network(TCSVT 2019).
MGAH: https://github.com/PKU-ICST-MIPL/MGAH_TMM2019
MGAH: Multi-pathway Generative Adversarial Hashing for Unsupervised Cross-modal Retrieval(TMM 2019).
TPCKT: https://github.com/PKU-ICST-MIPL/TPCKT_TMM2019
TPCKT: Two-level Progressive Cross-media Knowledge Transfer(TMM 2019).
CM-GANs: https://github.com/PKU-ICST-MIPL/CM-GANS_TOMM2019
CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning(TOMM 2019).
SSDH: https://github.com/PKU-ICST-MIPL/SSDH_TCSVT2019
SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval(TCSVT 2019).
DCKT: https://github.com/PKU-ICST-MIPL/DCKT_CVPR2018
Deep Cross-media Knowledge Transfer(CVPR 2018).
MHTN: https://github.com/PKU-ICST-MIPL/MHTN_TCYB2018
MHTN: Modal-adversarial Hybrid Transfer Network for Cross-modal Retrieval(TCYB 2018).
SCH-GAN: https://github.com/PKU-ICST-MIPL/SCHGAN_TCYB2018
SCH-GAN: Semi-supervised Cross-modal Hashing by Generative Adversarial Network(TCYB 2018).
MCSM: https://github.com/PKU-ICST-MIPL/MCSM_TIP2018
Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network(TIP 2018).
TCLSTA: https://github.com/PKU-ICST-MIPL/TCLSTA_TCSVT2018
Two-stream Collaborative Learning with Spatial-Temporal Attention for Video Classification(TCSVT 2018).
OPAM: https://github.com/PKU-ICST-MIPL/OPAM_TIP2018
Object-Part Attention Model for Fine-grained Image Classification(TIP 2018).
CCL: https://github.com/PKU-ICST-MIPL/CCL_TMM2018
CCL: Cross-modal Correlation Learning with Multi-grained Fusion by Hierarchical Network(TMM 2018).
QaDWH: https://github.com/PKU-ICST-MIPL/QaDWH_TMM2018
Query-adaptive Image Retrieval by Deep-Weighted Hashing(TMM 2018).
UGACH: https://github.com/PKU-ICST-MIPL/UGACH_AAAI2018
Unsupervised Generative Adversarial Cross-modal Hashing(AAAI 2018).
Saliency-guided-Faster-R-CNN: https://github.com/PKU-ICST-MIPL/Saliency-guided-Faster-R-CNN_ACMMM2017
Fine-grained Discriminative Localization via Saliency-guided Faster R-CNN(ACM MM 2017).
CHTN: https://github.com/PKU-ICST-MIPL/CHTN_IJCAI2017
Cross-modal Common Representation Learning by Hybrid Transfer Network(IJCAI 2017).
DPEP: https://github.com/PKU-ICST-MIPL/DPEP
Cross-media Retrieval by Exploiting Fine-Grained Correlation at Entity Level(Neurocomputing 2017).
CMDN: https://github.com/PKU-ICST-MIPL/CMDN_IJCAI2016
Cross-media Shared Representation by Hierarchical Learning with Multiple Deep Networks(IJCAI 2016).
S2UPG: https://github.com/PKU-ICST-MIPL/S2UPG_TCSVT2016
Semi-Supervised Cross-Media Feature Learning with Unified Patch Graph Regularization(TCSVT 2016).
JRL: https://github.com/PKU-ICST-MIPL/JRL_TCSVT2014
Learning Cross-Media Joint Representation with Sparse and Semisupervised Regularization(TCSVT 2014).
JGRHML: https://github.com/PKU-ICST-MIPL/JGRHML_AAAI2013
Heterogeneous Metric Learning with Joint Graph Regularization for Cross-Media Retrieval(AAAI 2013).
CMCP: https://github.com/PKU-ICST-MIPL/CMCP_ICASSP2012
Cross-Modality Correlation Propagation for Cross-Media Retrieval(ICASSP 2012).
HSNN: https://github.com/PKU-ICST-MIPL/HSNN_MMM2012
Effective Heterogeneous Similarity Measure with Nearest Neighbors for Cross-Media Retrieval(MMM 2012).