Geyang (Albert) Xu
I am Albert (Geyang) Xu, a researcher and engineer focused on data-centric AI, ML system reliability, and governance. My goal is to define the frameworks for reliability and governance in the next generation of ML/data pipelines.
I hold a Master’s degree in Computer Science from UC San Diego (GPA: 3.88/4.0) and a Bachelor’s degree from the University of Liverpool (GPA: 3.93/4.0).
Currently, I work as a Manufacturing Data Analyst at Fuyao Glass America, building clean analytics layers and monitoring production reliability. Previously, I was a Research Engineer at the UCSD Halıcıoğlu Data Science Institute (HDSI), where I co-authored research on stress-testing ML pipelines.
I am actively applying for PhD positions for Fall 2026. My research interests lie at the intersection of data management and machine learning, specifically:
- Data-Centric Robustness: Stress-testing ML pipelines with adversarial data corruption.
- ML Reliability & Governance: Systematic approaches to monitoring and explaining ML system failures.
- Data Provenance: Tracking and managing data quality in complex streaming systems.
Research Philosophy
At Fuyao, I spend my days building analytics layers, designing experiments, and monitoring production to make our data systems transparent, stable, and traceable. At 4Pexonic, I witnessed the other side of the data lifecycle—how large-scale bibliographic data is collected, cached, and aggregated to produce metrics like “research impact” and “disciplinary trends.” There, technical decisions like caching strategies and latency handling directly shaped how the academic ecosystem was perceived.
These experiences reinforced a common theme: engineering roles are often about executing within existing rules and standards. I can optimize a coating line’s data pipeline or tune an analytics platform for low latency, but I have little influence over how we evaluate and govern these systems in the first place.
What truly fascinates me is not just maintaining stability, but defining the general frameworks for reliability and governance in the next generation of ML and data pipelines. How should we describe and inject data errors? How do we perform systematic stress testing on complex pipelines? How do we translate these results into evidence that engineers, stakeholders, and regulators can trust?
These questions go far beyond individual engineering practice; they require systematic research and community consensus. For me, pursuing a PhD is not just about obtaining another degree—it is the necessary path from being someone who executes existing standards to someone who helps define them.