Data Quality and Fairness in ML

Research on improving fairness and robustness in machine learning models through data quality management

UCSD Halıcıoğlu Data Science Institute (Oct 2023 - Present)
Research Assistant | Supervisor: Babak Salimi | La Jolla, CA

This project focuses on developing a framework to manage and improve data quality, with the goal of enhancing fairness and robustness in machine learning models. The work involves creating tools to handle data biases and conducting experiments to assess the effectiveness of these tools.

  • Developed and implemented a framework for specifying and injecting complex data quality issues into datasets, using Python to address data biases and improve fairness and accuracy in machine learning models.
  • Designed and executed experiments to evaluate the robustness of data cleaning tools and machine learning models against various data biases such as missing values, selection bias, and outliers.
  • Collaborated on the development of a declarative language for pattern specification, allowing users to define intricate dependencies between dataset attributes to simulate real-world data corruption.
UCSD Halıcıoğlu Data Science Institute, where the research on data quality and fairness in machine learning models is conducted.

Throughout the project, the focus has been on developing a comprehensive framework that can be applied across different machine learning models to ensure that they are robust and fair, even when faced with biased or incomplete data.

The project involves extensive experimentation and evaluation of various data cleaning tools and techniques.

References