Photo

/ / / / / / / / / / / / / / / /
Email: danqing.zhang.personal AT gmail

visitors since 05-04-2020, 4pm.

web counter

unique visitors since 11-24-2021, 11pm.

Welcome to my homepage!

I am an MLE/researcher with 6 years of experience at Amazon and a PhD degree, specializing in NLP, LLM, information retrieval, and recommendation systems. Currently, I am an Applied Science Lead/ Manager at Amazon Ads, and I am skilled in formulating the machine learning problems and prototyping scalable solutions. Before that, I worked as Sr. Applied Research Scientist at Amazon Search (A9.com) QU team from 2018 to Sep, 2022. I received my PhD degree in Systems Engineering (Cyber-physical Systems) from University of California Berkeley, College of Engineering in May 2018, under supervision of Professor Alexei Pozdnoukhov. Before coming to Berkeley, I obtained my B.S. degree in Urban Science and secondary B.A. degree in Economics at Peking University in June 2014.
I am a quick learner who has successfully transitioned across various domains, primarily independently, from GIS & Economics to Smart Cities Research (focusing on spatial-temporal analysis and agent-based simulation), and then to NLP, Search/Ads, and Large Language Models (LLMs). My adaptability and motivation to iterate on ideas and efficiently execute tasks have been proven throughout my graduate study and career.
I have co-authored over 10 papers presented at top AI conferences, accumulating approximately 500 citations by 02/2024. Furthermore, I have deployed five deep learning models from start to finish within Amazon's Production System. I also hold one granted US patent and have two US patents pending.

Description of the plot

Highlighted Research

GraphRAD: A Graph-based Risky Account Detection System [KDD-MLG'18] My contributions to this research were critical, as I identified a novel and important research problem in graph-based fraud detection and machine learning systems. (1) I identified a novel and important research problem: how to detect fraudster groups containing identified fraudsters and other closely related users for further investigation. (2) This machine learning system has already been adopted and modified by tech giants such as Alibaba, Alipay, and Tencent, which speaks to its value and novelty.

Connected population synthesis for transportation simulation [TRC'19] My contribution to the field is significant for three key reasons. Firstly, my novel research on connected synthetic populations is the first of its kind and represents a new research direction for the field. My paper on this topic was accepted by the prestigious journal Transportation Research Part C: Emerging Technologies in 2019, and my proposed methods have been widely adopted by researchers worldwide, with 22 citations as of February 2023. Secondly, I am the first to utilize passively collected call records to generate a synthetic population with more detailed home locations, which is a major breakthrough in breaking free of discrete traffic analysis zones. Lastly, I am the first to effectively incorporate both social network information and location information from passively collected call records to define community spatial distribution patterns. This approach represents a significant advancement in the field and has the potential to improve the accuracy and effectiveness of urban simulations.

Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data [ACL'21] Weak supervision has been successful in NER, yet most of the current research is focused on deep NER models trained only with weak supervision. I propose a realistic situation where we have a modest amount of strongly labeled data and a significant volume of weakly labeled data. To harness both datasets and enhance the performance of deep NER models, I've introduced a new multi-stage computational strategy called NEEDLE. This includes weak label completion, a noise-sensitive loss function, and a concluding refinement using the strongly labeled data. My framework tackles the challenge where weakly labeled data might not necessarily boost model efficacy and pushes the boundaries of current NER research.

QUEACO: Borrowing Treasures from Weakly-labeled Behavior Data for Query Attribute Value Extraction [CIKM'21] I proposed QUEACO, which, to the best of my knowledge, is the first attempt to introduce a unified query attribute value extraction system in E-commerce, encompassing both query NER and AVN. My work stands out in three primary areas: Firstly, the system addresses both the named entity recognition (NER) and attribute value normalization (AVN) phases, which is a novel approach compared to existing works that mainly focus on NER. This is important because AVN is equally important but often neglected in the literature. Secondly, QUEACO leverages large-scale weakly-labeled behavior data to improve extraction performance with less supervision cost. This is a significant contribution because it addresses the issue of the high cost of obtaining labeled data, which is a common problem in many NLP tasks. Thirdly, the model for query parsing I developed was deployed by Amazon, resulting in a 38.2% increase in token-level coverage compared to the previous system. These signals are now used for product reranking, leading to a 0.36% improvement in NDCG@16 by leveraging the signals generated from query parsing.


Highlighted AIGC Research

LLM for explainable relevance labeling [Pending US patent]

On Data Augmentation for Extreme Multi-label Classification [Cited ~30 times by 02/2024]

SEQZERO: Few-shot Compositional Semantic Parsing with Sequential Prompts and Zero-shot Models [NAACL'22 (Findings)]


Directions I am excited to work on

AI-first data labeling, ML testing

RAG system enhancement

multi-modal LLM

AI agent specialized in advanced tasks within a specific field


Expertise Experience
large language model (LLM) 2021/10 - Present
online algorithms, bandit algorithms 2021/05 - Present
natural language processing, information extraction, text classification 2018 – Present
deep learning 2017 - Present
social network analysis, graph mining 2014 – Present
urban informatics, smart cities 2014 - 2019
data science, data analytics 2013 - Present
location-based service (LBS), geodatabase, geoanalysis, remote sensing 2013 - 2014
budget pacing, budget optimization 2023/06 - 2024/03
bidding: autobidding, bid optimization 2023/06 - 2024/03
mechanism design & measurement 2023/06 - 2024/03
auction allocation, pricing 2023/06 - 2024/03

News

  • 07/28/2024 - San Francisco Marathon Second Half Marathon

  • 03/17/2024 - Oakland Running Festival 5K

  • 02/14/2024 - Invited to serve as Program Committee for IEEE BigData 2024 .

  • 02/07/2024 - Invited to review for Transportation Research Part E.

  • 02/05/2024 - Invited to serve as Reviewer for first Conference on Language Modeling (https://colmweb.org/).

2023
  • 12/05/2023 - Invited to serve as Program Committee for ICML 2024.

  • 11/20/2023 - Invited to serve as Reviewer for EACL' 2024, The 18th Conference of the European Chapter of the Association for Computational Linguistics.

  • 10/20/2023 - Selected as organizing committee member for Amazon Ads LLM Hackathon.

  • 10/16/2023 - One paper was accepted by ICDE 2024.

  • 09/01/2023 - Invited to serve as Reviewer for the Search track of The Web Conference 2024.

  • 07/18/2023 - Invited to serve as Technical Reviewer for Science Publications (SciPub) at Amazon.

  • 07/13/2023 - Invited to serve as Program Committee for AAAI 2024.

  • 06/30/2023 - Workshop proposal, Brand Understanding and Brand Shopping 2023, has been accepted as a part of AMLC (Amazon Machine Learning Conference) 2023.

  • 06/29/2023 - Invited to serve as Reviewer for ARR 2023 - June (EMNLP 2023).

  • 06/10/2023 - Invited to serve as Program Committee for CIKM 2023, Short Paper track.

  • 05/21/2023 - Invited to serve as Program Committee for CIKM 2023, Long Paper track.

  • 05/16/2023 - One paper was accepted by KDD 2023 (applied data science paper).

  • 04/23/2023 - Invited to serve as Program Committee member for SIGIReCom'23 (The 2023 SIGIR Workshop On eCommerce).

  • 03/27/2023 - Invited to serve as Reviewer for NeurIPS 2023.

  • 03/25/2023 - Thanks to the assistance of my husband and ChatGPT, I have successfully rebuilt my website, including my tech blog which is now accessible online again.

  • 03/24/2023 - Invited to serve as Reviewer for ARR 2023 - February (ACL 2023).

  • 03/22/2023 - Invited to serve as Reviewer for ARR 2023 - February (ACL 2023).

  • 02/28/2023 - Invited to serve as Program Committee member for ECML/PKDD 2023.

  • 02/21/2023 - Invited to serve as Program Committee member for TheWebConf2023-Companion.

  • 02/19/2023 - Invited to review for Transportation Research Part A.

  • 02/02/2023 - Invited to review for International Journal of Computational Intelligence Systems.


2022
  • 12/23/2022 - Invited to serve as Reviewer for ICML 2023.

  • 11/24/2022 - One paper was accepted by LOG (Learning on Graphs Conference) 2022 (research long paper).

  • 10/03/2022 - Joined Amazon Ads Brand Understanding Team as Senior Applied Scientist.

  • 07/31/2022 - Invited to serve as Program Committee member for AAAI 2023.

  • 07/18/2022 - Invited to serve as Program Committee member for CIKM 2022.

  • 07/17/2022 - Invited to serve as Reviewer for ARR 2022 - July (EMNLP 2022).

  • 05/18/2022 - One paper was accepted by KDD 2022 (research long paper).

  • 04/07/2022 - One paper was accepted by NAACL 2022, findings (research long paper).

  • 03/30/2022 - Invited to serve as Reviewer for NeurIPS 2022.

  • 03/15/2022 - Invited to serve as Program Committee member for ECML/PKDD 2022.

  • 02/22/2022 - One US patent granted.

  • 02/17/2022 - Invited to serve as Reviewer for ARR 2022 - February (NAACL 2022).

  • 01/23/2022 - Invited to review for INFORMS Journal on Computing.

  • 01/18/2022 - One paper was accepted by WWW 2022 (research long paper).

  • 01/17/2022 - Invited to serve as Reviewer for ARR 2022 - January (NAACL 2022).

  • 01/09/2022 - Invited to serve as Program Committee member for WSDM'22 Workshop: Decision Making for Modern Information Retrieval System.


2021
  • 12/26/2021 - Invited to serve as Reviewer for ARR 2021 - September (ACL 2022).

  • 11/27/2021 - Invited to review for International Journal of Information Technology & Decision Making.

  • 11/12/2021 - Started organizing A9 ML Research Talk.

  • 10/10/2021 - Promoted to Senior Applied Scientist.

  • 09/08/2021 - Invited to review for International Journal of Computational Intelligence Systems.

  • 08/25/2021 - One paper was accepted by EMNLP 2021 (research long paper).

  • 08/09/2021 - One paper was accepted by CIKM 2021 (Applied Research Track).

  • 07/24/2021 - Invited to review for ICLR 2022.

  • 07/03/2021 - Served as subreviewer for CIKM 2021.

  • 06/21/2021 - Invited to review for International Transactions on Electrical Energy Systems.

  • 06/11/2021 - Invited to serve as Reviewer for NLPCC 2021.

  • 05/26/2021 - Invited to serve as Program Committee member for SIGIReCom'21.

  • 05/07/2021 - Started my substack, but later switched back to using GitHub pages.

  • 05/05/2021 - One paper was accepted by ACL-IJCNLP 2021 (research long paper).

  • 04/05/2021 - Invited to review for Transportation Research Part C.

  • 03/24/2021 - Invited to serve as Program Committee member for ECML/PKDD 2021.

  • 03/10/2021 - One paper was accepted by NAACL-HLT 2021 (research long paper).

  • 01/25/2021 - Started actively maintaining my personal website.


2020
  • 10/20/2020 - I started anonymously blogging about my experiences working in data science and my Ph.D. journey on a social media platform. Although I discontinued my blogging in late 2021, I have managed to accumulate almost 1k followers.

  • 07/30/2020 - Paper selected as best paper for SIGIR eCom'20.

  • 04/02/2020 - Started running my twitter account for research on NLP and cyber physical systems.


2019
  • 10/13/2019 - Promoted to Applied Scientist II.

  • 09/24/2019 - One paper was accepted by Transportation Research Part E.

  • 07/28/2019 - San Francisco Marathon First Half Marathon. Total run: 2:18:38, Ranking: 1084/ 2817 (Female)


2018
  • 12/23/2018 - One paper was accepted by Transportation Research Part C.

  • 08/27/2018 - Joined Amazon Search Query Understanding Team as Applied Scientist I.

  • 06/08/2018 - One paper was accepted by Knowledge Discovery and Data Mining (KDD) Mining and Learning with Graphs (MLG) Workshop.

  • 05/15/2018 - Graduated from University of California Berkeley with a Ph.D. in Systems Engineering.

  • 05/02/2018 - Started running my personal blog while I was reading Lilian Weng's tech blog. But my personal blog was not maintained and was discontinued until the year 2023.



My chinese name 丹青 has its special meaning. On its wikipedia page: In Chinese painting, danqing (Chinese: 丹青; pinyin: dān qīng) refers to paintings on silk and Xuan paper. Danqing is painted with an ink brush, color ink, or Chinese pigments using natural plant, mineral, and both metal pigments and pigment blends. Danqing literally means "red and blue-green" in Chinese, or more academically, "vermillion and cyan"; they are two of the most used colors in ancient Chinese painting.