2024 Early Career Collaboration Program Awardees
The Texas A&M Institute of Data Science (TAMIDS) Early Career Collaboration Program has selected seven faculty-led teams to receive funding for projects that aim to expand research collaboration with TAMIDS Thematic Labs and support individuals interested in Data Science, Artificial Intelligence (AI), and Machine Learning. Below are the awardees and the abstracts of their projects. Thank you to all the faculty who submitted proposals this year. Your dedication to research and collaboration strengthens our mission to use Data Science to solve challenges across all fields of study.
Awarded Projects
Quantifying Utility vs Privacy Tradeoffs in Digital Twins
Rui Tuo, College of Engineering
Raktim Bhattacharya, College of Engineering
Scientific Machine Learning Lab | Privacy, Data Science, ML and AI; Uncertainty Quantification
Digital twins are virtual representations of physical objects or systems that are created using data collected from sensors, high-fidelity simulations, and other sources. The use of digital twins often involves a tradeoff between utility and privacy. On one hand, digital twins can provide a wide range of benefits and efficiencies by allowing organizations to simulate, analyze, and predict the behavior of physical objects or systems. This can lead to cost savings, improved decision-making, and increased productivity. On the other hand, digital twins also have the potential to collect and store sensitive data about individuals or organizations, which can raise concerns about privacy. As a result, organizations and individuals need to carefully consider the potential benefits and risks of using digital twins, and take steps to balance the utility of these systems with the need to protect privacy. The team proposes to examine predictive machine learning models that enable uncertainty quantification to be used for privacy assessments. Synthetic noise is added to the data to ensure the privacy guarantees of the predictive model. When successfully carried out, the proposed activities lead to a novel framework and a set of methodologies to build optimal predictive models with privacy and utility guarantees.
Urban Digital Twin Integration for Energy Optimization in Indoor-Outdoor Campus Environments
Ashrant Aryal, School of Architecture
Xinyue Ye, School of Architecture
Urban AI Lab | Digital twins, integrated built environment, smart cities, smart buildings
This project aims to develop an integrated digital twin for the built environment, merging indoor and outdoor spaces to enhance operations and energy efficiency at Texas A&M University. Traditional digital twins have focused separately on buildings or urban environments, missing crucial interdependencies. By capturing these connections, the project seeks to optimize energy use and demand flexibility, using the campus as a pilot testbed. The initiative involves creating a 3D virtual environment, integrating data from diverse sources, and conducting energy simulations to identify efficiency improvements. Collaborating with various stakeholders, the project will refine digital twin technologies and expand their application. The research tasks include framework development, digital twin creation, data integration, and energy analysis, aiming for a comprehensive understanding of the campus ecosystem. Led by Dr. Ashrant Aryal and Dr. Xinyue Ye, the project addresses national research gaps and aligns with TAMIDS initiatives, enhancing data science education and collaboration opportunities.
Human-Centered Data-Driven Decision Making for Integrated (Re)Zoning and Evacuation Planning
Xiaofeng Nie, College of Engineering
David Eckman, College of Engineering
Urban AI Lab | Zoning, Evacuation, Data-Driven Decision Making, Traffic Congestion
With the trend of climate change, natural disasters are happening more frequently. To alleviate their negative impacts, developing effective and equitable disaster management strategies is critical. Massive evacuations are commonly carried out for many kinds of disasters, such as hurricanes, wildfires, and floods. According to FEMA, zone-based evacuation can be most effective in achieving the goal of evacuation. To facilitate zone-based evacuation, most coastal counties prone to hurricanes have delineated evacuation zones to prioritize evacuation sequences to lessen traffic congestion. Though strategic evacuation zoning decisions intertwine with subsequent operational zone-based evacuation decisions, these two decision processes are decoupled in both practice and scientific research. In this project, interdisciplinary and convergent research will be conducted to better design evacuation strategies through integrated evacuation zoning/rezoning and traffic assignment decisions. We will investigate two data-driven approaches: flow-based and simulation-based optimization. The approaches differ in how we model zone-based evacuation (i.e., one via a network flow model and the other via a simulation model) and how we solve the resulting two-stage stochastic program (i.e., traditional and simulation-based optimization). The two approaches could be used complementarily to validate and refine each other. Moreover, testbeds for hurricane and wildfire evacuations will be developed through collaborative efforts.
Contextualizing Cervical Cancer Screening & Human Papillomavirus (HPV) Vaccine Information and Corresponding Risk Perceptions from YouTube
Shawn Chiang, School of Public Health
Ruihong Huang, College of Engineering
Data Justice Lab | Cancer prevention; YouTube; Multi-modal machine learning; young adult
The President’s Cancer Panel has called for renewed efforts in promoting cervical cancer screening and prevention in 2022 to reduce cancer mortality. Despite available screening programs, 1 in 4 age-eligible U.S. women do not get screened for cervical cancer currently. Given that cervical cancer screening (e.g., pap smear) recommendation starts at age 21, social network sites (SNS) are promising intervention channel to deliver cervical cancer-related health information to young adults. The 2-year study will examine cervical cancer discussions on YouTube to understand their impact on screening attitudes, using the Behavioral and Social Driver framework. It involves developing a computational model to detect misinformation in YouTube shorts about the HPV vaccine and cervical cancer screening (Aim 1) and analyzing video features and audience reactions to understand engagement patterns and how misinformation differs from accurate content (Aim 2). Outcomes will support health communication material development and provide critical foundational data for external grant submission on HPV vaccination promotion utilizing data science. Furthermore, the proposal will catalyze community engagement within TAMU by organizing educational workshops and seminars on the role of data science in cancer communication and by providing research data for student/class projects.
Developing a Computer Vision Data Pipeline for Bovine Respiratory Disease Prediction in Beef Cattle
Karun Kaniyamattam, College of Agriculture & Life Sciences
Yalong Pi, Texas A&M Institute of Data Science
Operational Data Science Lab | Bovine Respiratory Disease, Precision Livestock Technology, Computer vision, Machine Learning
The annual agricultural receipts from the US beef cattle systems are valued at $66 billion. Bovine Respiratory Disease (BRD) is the most devastating cattle disease, with an estimated annual economic impact of $2 billion. At present the disease is diagnosed using the DART (Depression, Appetite, Respiration, and Temperature) scoring system assigned by pen-riders (manual), a highly subjective and inaccurate methodology. Hence our teams’ overarching goal is to develop a precision livestock technology-based DART score prediction methodology. We will use near-infrared technology-based computer vision to predict individual cattle temperature difference, a key predictor of BRD. This project will scale up and harness the capabilities of the already established precision livestock infrastructure at Texas A&M Nutrition and Physiology Center that is tracking the individual cattle feeding behavior (appetite) using depth cameras. In addition, data from other biosensors like accelerometers and sound-based respiratory distress detection will be used. Sensor data will be captured, processed, and transmitted via communication networks to be stored in databases, where it undergoes further processing and analysis using machine learning algorithms. The successful implementation of this system will ensure prediction of DART scores of individual cattle at an earlier time than plausible by pen-rider based DART score prediction.
Modeling and Control of Grid-interactive Desalination Plants
Guanyu Tian, Texas A&M University at Galveston
Zheng O’Neill, College of Engineering
Digital Twins Lab | Desalination, demand response, data-driven modeling, model predictive control
This project aims to support preliminary studies for the modeling and control of next-generation grid-interactive desalination plants capable of participating coherently in grid operations. It comprises two major tasks. The first task involves developing a high-fidelity physics-informed data-driven power consumption model for desalination plants. Current desalination plant models primarily focus on detailed mechanical and hydraulic dynamics related to the desalination process, neglecting higher-level relationships between power consumption and freshwater output [4]. Therefore, it is imperative to create generalizable high-level power consumption models suitable for engaging desalination plants of various sizes and configurations in grid operations. We plan to generate data using a commercial desalination plant digital twin model and pioneer the creation of the first open-source dataset and data-driven model for desalination plant power consumption. The second task involves developing optimization-based control algorithms aimed at reducing the operation costs of desalination plants. This will entail integrating grid economic factors such as real-time prices and revenue from ancillary services into the problem formulation. To support further investigations and validation studies beyond this initial phase, we will actively seek external funding and collaborations based on the success of this project.
Developing a multi-source data integration, processing, and analytics system for a Robotic Platform
Mahendra Bhandari, College of Agriculture & Life Sciences
Haoyu Niu, Texas A&M Institute of Data Science
Agricultural Smart Data Lab | computer vision, agriculture, robotics, biomass
In recent years, there has been an increased interest in utilizing artificial intelligence (AI) and computer vision technologies in agriculture especially to monitor crops, detect pests, and to estimate productivity for better management decisions. However, an enormous amount of high-quality data is needed to train robust AI or computer vision algorithms. Some of the data collection platforms common in agriculture are satellite, Unmanned Aerial Systems (UAS), and ground-based robotic platforms. Each of these platforms have their own advantages and disadvantages with respect to endurance, resolution, and area coverage and should be integrated to develop robust algorithms. In this project we envision to develop a hardware platform that seamlessly integrates multiple sensors, processes data, and provides output in real time. Additionally, we propose to develop a data processing pipeline and algorithms for early season stand counts and multi-temporal biomass estimations as use-case scenarios. The data from this robotic platform will be validated with the ground measurements. We hope that this platform can be integrated into farm equipment for the producers and extension agronomists to collect data and obtain results in real time. Additional value of this platform will be in validating satellite datasets with high-resolution and more localized datasets.