Mingqian Ma (马鸣谦)
I am a senior undergraduate student at the Computer Science and Engineering department at the University of Michigan, Ann Arbor. I'm also in the dual degree program with Shanghai Jiao Tong University, where I major in Electrical and Computer Engineering.
My research Interest focus at training Foundation Models in Science and General domains. I'm especially interested in pretraining foundation models in Genomics and Optics. More generally, I'm interested in using Machine Learning Methods in Applicative Scenario. I've been advised by Dr. Guoqing Liu @MSR, Prof. L. Jay Guo @UMich EECS, and Prof. Xiaofeng Gao @SJTU SEIEE.
I'm finding a PhD position in the field of AI for Science and NLP. If you are interested in my research, please feel free to contact me.
Email /
GitHub /
Google Scholar /
LinkedIn
|
|
News
[February-2025] Our HybriDNA paper and the NatureLM project I contributed to at MSR are now online on arXiv! Check them out!
[October-2024] Our paper was accepted by Foundation Model for Sciecne Workshop at NeurIPS 2024. See you in Vancouver this December!
[September-2024] Our survey paper in Multilayer Thin Film Design is under review. Check it out on Arxiv!
[May-2024] I've joined Microsoft Research AI4Science team as a research intern advised by Dr. Guoqing Liu. I'm working on pretraining large-scale foundation models in Genomics.
|
Research
I'm interested in Deep Generative Models for Science and General domains with the focus on sequential modeling problems.
|
|
HybriDNA: A Hybrid Transformer-Mamba2 Long-Range DNA Language Model
Mingqian Ma, Guoqing Liu, Chuan Cao, Pan Deng et al.
ArXiv, 2025
arxiv /
website /
Advances in natural language processing have inspired new approaches to modeling DNA, often called the “language of life.” However, DNA modeling requires handling ultra-long sequences with single-nucleotide precision and excelling in both generative and understanding tasks. We introduce HybriDNA, a decoder-only DNA language model that combines Transformer and Mamba2 architectures to efficiently process sequences up to 131kb. HybriDNA achieves state-of-the-art performance across 33 DNA understanding benchmarks and excels in generating synthetic regulatory elements. Our findings highlight its scalability from 300M to 7B parameters, demonstrating its potential to drive new discoveries in DNA research and applications.
|
|
NatureLM: Deciphering the Language of Nature for Scientific Discovery
NatureLM Team, Microsoft Research AI for Science
ArXiv, 2025
arxiv /
website /
NatureLM, developed by Microsoft Research AI for Science, is a groundbreaking sequence-based science foundation model designed to unify multiple scientific domains, including small molecules, materials, proteins, DNA and RNA. This innovative model leverages the “language of nature” to enable scientific discovery through text-based instructions.
|
|
Solving Out-of-Distribution Challenges in Optical Foundation Models using Self-Improving Data Augmentation
Mingqian Ma, Taigao Ma, L. Jay Guo
FM4Science@NIPS-W, 2024
paper /
Optical multilayer thin film structures are widely used in many photonic applications. The important part to enable these applications is inverse design, which seeks to identify a suitable structure that satisfy desired optical responses. We propose a self-improving data augmentation technique by leveraging neural networks’ extrapolation ability. Using this method, we show significant improvement in various real-applicative design tasks with minimum fine-tuning, which can also be potentially generalized to inverse scientific foundation models.
|
|
Optical Multilayer Thin Film Structure Inverse Design: From Optimization to Deep Learning
Taigao Ma, Mingqian Ma, L. Jay Guo
under review, 2024
arxiv /
A survey paper of optical multilayer thin film structure inverse design. The survey convers all aspects from the traditional optimization-based methods to state-of-the-art deep learning-enabled inverse design algorithms.
|
|
Encoder-Decoder Based Route Generation Model for Flexible Travel Recommendation
Jiale Zhang, Mingqian Ma , Xiaofeng Gao, Guihai Chen
IEEE Transactions on Service Computing, 2024
paper /
A ML4CO framework to recommend suitable routes for tourists while satisfying explicit constraints like must-visit points, unavailable hours, etc.
|
|