yi yang homepage

Yi Yang | Research

My research on NLP, AI and Fintech is published in major business journals and major NLP/AI conferences and journals.

Forget Me If You Can: Auditing User Data Revocation in Recommendation Systems.
Zhihao Zhu, Yi Yang, Yangyang Fan, Defu Lian.
Information Systems Research. Forthcoming.

GAPrune: Gradient-Alignment Pruning for Domain-Aware Embeddings.
Yixuan Tang, Yi Yang.
International Conference on Learning Representations (ICLR). 2026.

Robust Predictive Modeling under Unseen Data Distribution Shifts: A Methodological Commentary.
Hanyu Duan, Yi Yang, Ahmed Abbasi, Kar Yan Tam.
Information Systems Research. Forthcoming.

Revealing the Numeracy Gap: An Empirical Investigation of Text Embedding Models.
Ningyuan Deng, Hanyu Duan, Yixuan Tang, Yi Yang.
Conference of the European Chapter of the Association for Computational Linguistics (EACL Findings). Long Paper. 2026.

Learning from Earnings Calls: Graph-Based Conversational Modeling for Financial Prediction.
Yi Yang, Yixuan Tang, Yangyang Fan, Kunpeng Zhang.
Information Systems Research. Forthcoming.

Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework.
Hongyi Tang, Zhihao Zhu, Yi Yang.
Conference on Empirical Methods in Natural Language Processing (EMNLP). Long Paper. 2025.

FinMTEB: Finance Massive Text Embedding Benchmark.
Yixuan Tang, Yi Yang.
Conference on Empirical Methods in Natural Language Processing (EMNLP). Long Paper. 2025.

Evaluating and Aligning Human Economic Risk Preferences in LLMs.
Jiaxin Liu, Yixuan Tang, Yi Yang, Kar Yan Tam.
Conference on Empirical Methods in Natural Language Processing (EMNLP). Long Paper. 2025.

Adapting General-Purpose Embedding Models to Private Datasets Using Keyword-based Retrieval.
Yubai Wei, Jiale Han, Yi Yang.
Findings of Annual Meeting of the Association for Computational Linguistics (ACL Findings). Long Paper. 2025.

Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning.
Jiaqi Li, Yixuan Tang, Yi Yang.
Findings of Annual Meeting of the Association for Computational Linguistics (ACL Findings). Long Paper. 2025.

Bias A-head? Analyzing Bias in Transformer-Based Language Model Attention Heads.
Hanyu Duan, Yi Yang, Ahmed Abbasi, John P. Lalor, Kar Yan Tam.
Fifth Workshop on Trustworthy Natural Language Processing (TrustNLP). 2025.

PersonaFuse: A Personality Activation-Driven Framework for Enhancing Human-LLM Interactions.
Yixuan Tang, Yi Yang, Ahmed Abbasi.
Major Revision.

LLM-Measure: Generating Valid, Consistent, and Reproducible Text-Based Measures for Information Systems Research.
Hanyu Duan, Jiaxin Liu, Yi Yang, Kar Yan Tam.
Major Revision.

Designing Financial Text Embeddings with Persona–Based Supervision.
Yixuan Tang, Yi Yang.
Major Revision.

PatentsNET: A Graph Representation Learning Approach for Predicting Patent Economic Value and Litigation Risk.
Zhitao Yin, Yi Yang, Zhuoyi Peng, Zhenghan Zhang.
Major Revision.

Reading Between the Lines: A Text-based Deep Learning Approach for Understanding Company Dynamics.
Hanyu Duan, Yi Yang, Kar Yan Tam.
Major Revision.

Dual IT Strategy and Firm Performance: New Insights by Inferring Firm Strategies from a Deep Learning Approach.
Yi Yang, Chewei Liu, Terence Saldanha, Sunil Mithas.
Major Revision.

Hypergraph Modeling of Supply Chains: Unveiling the Impact of High-Order and Temporal Dynamics on Credit Risk Prediction.
Jialei Han, Yi Yang, Yangyang Fan, Zhongju Zhang.
Major Revision

HoneyImage: Verifiable, Harmless, and Stealthy Dataset Ownership Verification for Image Models.
Zhihao Zhu, Jiale Han, Yi Yang.
Under Review.

TDDBench: A Benchmark for Training Data Detection.
Zhihao Zhu, Yi Yang, Defu Lian.
International Conference on Learning Representations (ICLR). 2025.

Adversarial Mixup Unlearning.
Zhuoyi Peng, Yixuan Tang, Yi Yang.
International Conference on Learning Representations (ICLR). 2025.

Divide-and-Contrast: A Text-based Method for Firm Market Risk Prediction.
Yi He, Yi Yang, Defu Lian, Kunpeng Zhang.
INFORMS Journal on Computing. 2025.

Efficient Multi-Expert Tabular Language Model for Banking.
Yue Guo, Wentao Zhang, Xiaojun Zhang, Vincent W Zheng, Yi Yang.
SIGKDD Conference on Knowledge Discovery and Data Mining - Applied Data Science Track (KDD). 2025.

Hierarchical Deep Document Model.
Yi Yang, John Lalor, Ahmed Abbasi, Daniel Zeng.
Transactions on Knowledge and Data Engineering (TKDE). 2024.

Exploring the Relationship between In-Context Learning and Instruction Tuning.
Hanyu Duan, Yixuan Tang, Yi Yang, Ahmed Abbasi, Kar Yan Tam.
Findings of Conference on Empirical Methods in Natural Language Processing (EMNLP Findings). Long Paper. 2024.

MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries.
Yixuan Tang, Yi Yang.
Conference on Language Modeling (COLM). 2024.

EconNLI: Evaluating Large Language Models on Economics Reasoning.
Yue Guo, Yi Yang.
Findings of Annual Meeting of the Association for Computational Linguistics (ACL Findings). Long Paper. 2024.

Beyond Surface Similarity: Detecting Subtle Semantic Shifts in Financial Narratives.
Jiaxin Liu, Yi Yang, Kar Yan Tam.
Findings of Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL Findings). Long Paper. 2024.

Connecting the Dots: Inferring Patent Phrase Similarity with Retrieved Phrase Graphs.
Zhuoyi Peng, Yi Yang.
Findings of Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL Findings). Long Paper. 2024.

Should Fairness be a Metric or a Model? A Model-based Framework for Assessing Bias in Machine Learning.
John Lalor, Ahmed Abbasi, Kezia Oketch, Yi Yang, Nicole Forsgren.
ACM Transactions on Information Systems (TOIS), 2024.

TM-OKC: An Unsupervised Topic Model for Text in Online Knowledge Communities.
Dongcheng Zhang, Kunpeng Zhang, Yi Yang, and David Schweidel.
MIS Quarterly, 48, no. 3 (2024): 931-978.

InvestLM: A Large Language Model for Investment using Financial Domain Instruction Tuning.
Yi Yang, Yixuan Tang, Kar Yan Tam.

Predict the Future from the Past? On the Temporal Data Distribution Shift in Financial Sentiment Classifications.
Yue Guo, Chenxi Hu, Yi Yang.
Conference on Empirical Methods in Natural Language Processing (EMNLP). Long Paper. 2023.

Is ChatGPT a Financial Expert? Evaluating Language Models on Financial Natural Language Processing.
Yue Guo, Zian Xu, Yi Yang.
Findings of Conference on Empirical Methods in Natural Language Processing (EMNLP Findings). Short Paper. 2023.

FinEntity: Entity-level Sentiment Classification for Financial Texts.
Yixuan Tang, Yi Yang, Allen H Huang, Andy Tam, Justin Tang.
Conference on Empirical Methods in Natural Language Processing (EMNLP). Short Paper. 2023.

Causal-Debias: Unifying Debiasing in Pretrained Language Models and Fine-tuning via Causal Invariant Learning.
Fan Zhou, Yuzhou Mao, Liu Yu, Yi Yang and Ting Zhong.
Annual Meeting of the Association for Computational Linguistics (ACL). Long Paper. 2023.

Extracting Actionable Insights from Text Data: A Stable Topic Model Approach.
Yi Yang and Ramanath Subramanyam.
MIS Quarterly, 47, no. 3 (2023).

Unlocking the Power of Voice for Financial Risk Prediction: A Theory-Driven Deep Learning Design Approach.
Yi Yang, Yu Qin, Yangyang Fan and Zhongju Zhang.
MIS Quarterly, 47, no. 1 (2023).

BARLE: Background-Aware Representation Learning for Background Shift Out-of-Distribution Detection.
Hanyu Duan, Yi Yang, Ahmed Abbasi, Kar Yan Tam.
Findings of Conference on Empirical Methods in Natural Language Processing (EMNLP Findings). Long Paper. 2022.

Federated Meta Embedding Concept Stock Recommendation.
Zhuoyi Peng, Yi Yang, Liu Yang and Kai Chen.
IEEE Transactions on Big Data. 2022

FinBERT—A Large Language Model for Extracting Information from Financial Text.
Allen Huang, Hui Wang and Yi Yang.
Contemporary Accounting Research 40, no. 2 (2023): 806-841.

sDTM: A Supervised Bayesian Deep Topic Model for Text Analytics.
Yi Yang, Kunpeng Zhang and Yangyang Fan.
Information Systems Research 34, no. 1 (2023): 137-156.

Unlocking Deeper Insights into Customer Engagement Through AI-Powered Analysis of Social Media Data.
P.K. Kannan, Yi Yang and Kunpeng Zhang.
Management and Business Review 3 (1-2), 108-115.

Benchmarking Intersectional Biases in NLP.
John Lalor, Yi Yang, Kendall Smith, Nicole Forsgren, Ahmed Abbasi.
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). Long Paper. 2022.

Auto-Debias: Debiasing Masked Language Models with Automated Biased Prompts.
Yue Guo, Yi Yang, Ahmed Abbasi.
Annual Meeting of the Association for Computational Linguistics (ACL). Long Paper. 2022.

Buy Tesla, Sell Ford: Assessing Implicit Stock Market Preference in Pre-trained Language Models.
Chengyu Chuang, Yi Yang.
Annual Meeting of the Association for Computational Linguistics (ACL). Short Paper. 2022.

Deep Cross-Attention Network for Crowdfunding Success Prediction.
Zhe Tang, Yi Yang, Wen Li, Defu Lian, Lixin Duan.
IEEE Transactions on Multimedia. 2022.

Constructing a Psychometric Testbed for Fair Natural Language Processing.
Ahmed Abbasi, David Dobolyi, John P Lalor, Richard G Netemeyer, Kendall Smith, Yi Yang.
Conference on Empirical Methods in Natural Language Processing (EMNLP). Long Paper. 2021.

Learning Numeracy: A Simple Yet Effective Number Embedding Approach Using Knowledge Graph.
Hanyu Duan, Yi Yang and Kar Yan Tam.
Findings of Conference on Empirical Methods in Natural Language Processing (Findings of EMNLP). Short Paper. 2021.

Rumor Detection on Social Media with Event Augmentations.
Zhenyu He, Ce Li, Fan Zhou and Yi Yang.
International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 2021.

Identifying Market Structure: A Deep Network Representation Learning of Social Engagement.
Yi Yang, Kunpeng Zhang and P.K. Kannan.
Journal of Marketing. 2021.

FinBERT: A Pretrained Language Model for Financial Communications.
Yi Yang, Mark Christopher Siy UY, Allen Huang.

Generating Plausible Counterfactual Explanations for Deep Transformers in Financial Text Classification.
Linyi Yang, Eoin Kenny, Tin Lok James Ng, Yi Yang, Barry Smyth and Ruihai Dong.
International Conference on Computational Linguistics (COLING). 2020.

Analyzing Firm Reports for Volatility Prediction: A Knowledge-driven Text Embedding Approach.
Yi Yang, Kunpeng Zhang and Yangyang Fan.
INFORMS Journal on Computing. 2021.

Unifying Online and Offline Preference for Social Link Prediction.
Fan Zhou, Bangying Wu, Kunpeng Zhang, Yi Yang, and Harry Wang.
INFORMS Journal on Computing. 2020.

Interpreting Twitter User Geolocation.
Ting Zhong, Tianliang Wang, Fan Zhou, Goce Trajcevski, Kunpeng Zhang and Yi Yang.
Annual Meeting of the Association for Computational Linguistics. (ACL). Short Paper. 2020.

Interpretable Operational Risk Classification with Semi-supervised Variational Autoencoder.
Fan Zhou, Shengming Zhang and Yi Yang.
Annual Meeting of the Association for Computational Linguistics (ACL). Short Paper. 2020.

Neural Topic Model with Attention for Supervised Learning.
Xinyi Wang, and Yi Yang.
International Conference on Artificial Intelligence and Statistics (AISTATS). 2020.

What You Say and How You Say It Matters: Predicting Stock Volatility Using Verbal and Vocal Cues.
Yu Qin, and Yi Yang.
Annual Meeting of the Association for Computational Linguistics (ACL). Long Paper. 2019.

Vec2Link: Unifying heterogeneous data for social link prediction.
Fan Zhou, Bangying Wu, Yi Yang, Goce Trajcevski, Kunpeng Zhang, and Ting Zhong.
International Conference on Information and Knowledge Management (CIKM). 2018.

Improving Topic Model Stability for Effective Document Exploration.
Yi Yang, Shimei Pan, Yangqiu Song, Jie Lu, Mercan Topkara.
International Joint Conference on Artificial Intelligence (IJCAI). 2016.

The Stability and Usability of Statistical Topic Models.
Yi Yang, Shimei Pan, Jie Lu, Mercan Topkara and Yangqiu Song.
ACM Transactions on Interactive Intelligent Systems. 2016.

Beating the artificial chaos: Fighting OSN spam using its own templates.
Tiantian Zhu, Hongyu Gao, Yi Yang, Kai Bu, Yan Chen, Doug Downey, Kathy Lee and Alok Choudhary.
IEEE/ACM Transactions on Networking. 2016

Efficient Algorithm for Incorporating Knowledge into Topic Models.
Yi Yang, Doug Downey, Jordan Boyd-Graber.
Conference on Empirical Methods in Natural Language Processing (EMNLP). Long Paper. 2015.

Efficient Methods for Inferring Large Sparse Topic Hierarchies.
Doug Downey, Chandra Sekhar Bhagavatula, Yi Yang.
Annual Meeting of the Association for Computational Linguistics (ACL). Long Paper. 2015.

User-directed Non-disruptive Topic Model Update for Effective Exploration of Dynamic Content.
Yi Yang, Shimei Pan, Yangqiu Song, Jie Lu, and Mercan Topkara.
International Conference on Intelligent User Interfaces (IUI). 2015.
★Honorable Mention Award.

Spam ain’t as Diverse as It Seems: Throttling OSN Spam with Templates Underneath.
Hongyu Gao, Yi Yang, Kai Bu, Yan Chen, Doug Downey, Kathy Lee and Alok Choudhary.
Annual Computer Security Applications Conference (ACSAC). 2014.

A Systematic Framework for Sentiment Identification by Modeling User Social Effects.
Kunpeng Zhang, Yi Yang, Aaron Sun, Hengchang Liu.
International Conference on Web Intelligence (WI). 2014.

Incorporating Conditional Random Fields and Active Learning to Improve Sentiment Identification.
Kunpeng Zhang, Yusheng Xie, Yi Yang, Aaron Sun, Hengchang Liu, Alok Choudhary.
Neural Networks Journal. 2014.

Learning Representations for Weakly Supervised Natural Language Processing Tasks.
Fei Huang, Arun Ahuja, Doug Downey, Yi Yang, Yuhong Guo, and Alexander Yates.
Computational Linguistics Journal. 2014.

Overcoming the Memory Bottleneck in Distributed Training of Latent Variable Models of Text.
Yi Yang, Doug Downey and Alexander Yates.
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). Short Paper. 2013.