Mingqian Ma (马鸣谦)

I am a senior undergraduate student at the Computer Science and Engineering department at the University of Michigan, Ann Arbor. I'm also in the dual degree program with Shanghai Jiao Tong University, where I major in Electrical and Computer Engineering.

My research Interest focus at training Foundation Models in Science and General domains. I'm especially interested in pretraining foundation models in Genomics and Optics. More generally, I'm interested in using Machine Learning Methods in Applicative Scenario. I've been advised by Dr. Guoqing Liu @MSR, Prof. L. Jay Guo @UMich EECS, and Prof. Xiaofeng Gao @SJTU SEIEE.

I'm mainly working on pretraining large scale foundation models across domains, checkout NatureLM and HybriDNA for the details.

I'm joing CMU MLD as a master student this fall. If you are interested in my research or seeking for collaboration, please feel free to contact me.

Email / GitHub / Google Scholar / LinkedIn

News

[February-2025] Our HybriDNA paper and the NatureLM project I contributed to at MSR are now online on arXiv! Check them out!

[October-2024] Our paper was accepted by Foundation Model for Science Workshop at NeurIPS 2024. See you in Vancouver this December!

[September-2024] Our survey paper in Multilayer Thin Film Design is under review. Check it out on Arxiv!

[May-2024] I've joined Microsoft Research AI4Science team as a research intern advised by Dr. Guoqing Liu. I'm working on pretraining large-scale foundation models in Genomics.

Research

I'm interested in Deep Generative Models for Science and General domains with the focus on sequential modeling problems.

	HybriDNA: A Hybrid Transformer-Mamba2 Long-Range DNA Language Model Mingqian Ma, Guoqing Liu, Chuan Cao, Pan Deng et al. ArXiv, 2025 arxiv / website / Advances in natural language processing have inspired new approaches to modeling DNA, often called the “language of life.” However, DNA modeling requires handling ultra-long sequences with single-nucleotide precision and excelling in both generative and understanding tasks. We introduce HybriDNA, a decoder-only DNA language model that combines Transformer and Mamba2 architectures to efficiently process sequences up to 131kb. HybriDNA achieves state-of-the-art performance across 33 DNA understanding benchmarks and excels in generating synthetic regulatory elements. Our findings highlight its scalability from 300M to 7B parameters, demonstrating its potential to drive new discoveries in DNA research and applications.
	NatureLM: Deciphering the Language of Nature for Scientific Discovery NatureLM Team, Microsoft Research AI for Science ArXiv, 2025 arxiv / website / NatureLM, developed by Microsoft Research AI for Science, is a groundbreaking sequence-based science foundation model designed to unify multiple scientific domains, including small molecules, materials, proteins, DNA and RNA. This innovative model leverages the “language of nature” to enable scientific discovery through text-based instructions.
	Solving Out-of-Distribution Challenges in Optical Foundation Models using Self-Improving Data Augmentation Mingqian Ma, Taigao Ma, L. Jay Guo FM4Science@NIPS-W, 2024 paper / Optical multilayer thin film structures are widely used in many photonic applications. The important part to enable these applications is inverse design, which seeks to identify a suitable structure that satisfy desired optical responses. We propose a self-improving data augmentation technique by leveraging neural networks’ extrapolation ability. Using this method, we show significant improvement in various real-applicative design tasks with minimum fine-tuning, which can also be potentially generalized to inverse scientific foundation models.
	Optical Multilayer Thin Film Structure Inverse Design: From Optimization to Deep Learning Taigao Ma, Mingqian Ma, L. Jay Guo under review, 2024 arxiv / A survey paper of optical multilayer thin film structure inverse design. The survey convers all aspects from the traditional optimization-based methods to state-of-the-art deep learning-enabled inverse design algorithms.
	Encoder-Decoder Based Route Generation Model for Flexible Travel Recommendation Jiale Zhang, Mingqian Ma , Xiaofeng Gao, Guihai Chen IEEE Transactions on Service Computing, 2024 paper / A ML4CO framework to recommend suitable routes for tourists while satisfying explicit constraints like must-visit points, unavailable hours, etc.

Design and source code from Jon Barron's website

Mingqian Ma (马鸣谦)

News

Research

HybriDNA: A Hybrid Transformer-Mamba2 Long-Range DNA Language Model

NatureLM: Deciphering the Language of Nature for Scientific Discovery

Solving Out-of-Distribution Challenges in Optical Foundation Models using Self-Improving Data Augmentation

Optical Multilayer Thin Film Structure Inverse Design: From Optimization to Deep Learning

Encoder-Decoder Based Route Generation Model for Flexible Travel Recommendation