Goals and Scope: The Texas A&M Institute of Data Science (TAMIDS) convened a workshop on Operational Data Science on 2/11/2019. The goal of the workshop was to promote engagement between Data Science research and the challenges of Data Analytics practice in the operations of Texas A&M University. Topics included: education metrics; athletics performance, engagement and business; transportation; libraries; facilities management; and network security.
Schedule:
Introduction
(click arrows for abstracts and link to talks)
In this short welcome to the workshop, the TAMIDS Director, Dr. Nick Duffield, outlines his vision for Texas A&M as a living laboratory for applied Data Science, with opportunities to enhance the operations of Texas A&M that can serve as a proving ground for Data Science outreach. Slides
Session 1: Venues, Transportation & Geospatial Analysis
This talk will focus on the parking and transit data streams Transportation Services collects, the challenges associated with interpreting those data, and the successes we have achieved in meeting our goal of making parking and transit at Texas A&M a pleasurable experience. Slides
This talk will explore the ways that analytics and data science can be used to improve the operations of AggieMap, the official campus map of Texas A&M University. The talk will introduce attendees to the data sources and types which AggieMap uses, creates, and captures, and provide opportunities for the TAMU Data Science community to collaborate to improve AggieMap functions, user experiences, and community value. Slides
Geospatial big data, such as location-based social media are becoming increasingly significant in understanding, monitoring, and managing human behaviors under emergencies, such as natural hazards. We explore the applications of big social media data in the preparedness, response, and recovery phases of emergency management cycle, including monitoring social-geographical disparities, predicting post-disaster damage, surveying human behaviors, and conducting real-time emergency rescue.
This talk will discuss the need to nurture a replication standard for research. Such a standard could help overcome challenges related to collaboration and reproducibility. This talk will discuss challenges of data analytics experienced during applied data science projects within the Institute for Sustainable Communities and the Hazard Reduction and Recovery Center. The projects include spatial-temporal models that predict how natural hazards impact infrastructure and people. Slides
Accurate real-time traffic prediction has a key role in intelligent transportation systems. Building a prediction model for transportation networks is computationally challenging because of complex spatio-temporal dependencies and the large size of the graph. We propose a divide-and-conquer approach to traffic prediction which combines a novel clustering algorithm with new advances in graph signal processing. Slides
10:45 Discussion
11:00 to 11:30 Break with Refreshments
Session 2: Privacy, Personalization, and Awareness
Maintaining privacy and confidentiality of personal data while having sufficient information for meaningful use requires a well-orchestrated system that was designed with privacy in mind from the get go. The appropriate design lies in the art of balancing usability and privacy through privacy by design. Usable and affordable privacy requires all stakeholders, including the public, to do their due diligence that is enforced through transparency and accountability. Slides
Federal agencies collect and use data for a variety of programs and initiatives across sectors. Yet, cross-sector data use for research or public health is challenging in part because the US lacks a unified legal framework. Inconsistent laws place different restrictions on sensitive data. This presentation will evaluate federal data sharing laws and compare the federal approach to data protection with the new approach of the European Union’s new General Data Protection Regulation. Slides
Wearable sensors can tracking patients in and monitoring clinical risk factors. How we determine what is necessary and how to make use of this data remains a challenge. Our approach to bridging these areas is by mining rich, clinical data such as those found in electronic health records to inspire the design of new sensors for participant tracking. These include the design of new sensors, the challenges of validating ground truth data in uncontrolled environments, and personalization. Slides
This talk will explore the ways that analytics and data science can be used to improve the operations of AggieMap, the official campus map of Texas A&M University. The talk will introduce attendees to the data sources and types which AggieMap uses, creates, and captures, and provide opportunities for the TAMU Data Science community to collaborate to improve AggieMap functions, user experiences, and community value.
12:30 Discussion
12:45 to 2:00 Lunch (provided to registrants)
Session 3: Academic and Athletic Performance
In Academic services we seek to examine the use of Data Science to aid in understanding in three broad areas. First: the effect of marketing channels (direct mail, social media, post, email) on student behavior as it relates to use of services i.e. financial aid, career center, and matriculation behaviors. Second: we seek to gain insights into matriculation rates of students as function of demographic and major demand. Finally, we wish to discern the variables that influence successful completion of courses, and course pathway choices. This talk reviews some of the challenges and success in understanding these problems. Slides
The sport performance division of the Athletics Department is interested in communicating with faculty, staff, and students that may be interested in learning more about the unique challenges that we face when collecting, analyzing and interpreting data. This short talk will touch upon our department structure, our challenges, and will open up the lines of communication for future collaboration. Slides
We have an increasing ability to collect large amounts of relevant data on athletes training load and physiological responses. Beyond the challenges of managing billions of data points, we seek to establish automated analytical tools that 1. Learn individuals unique responses; 2. Integrate many variables; 3. Provides real time feedback on deviations from normal responses in an actionable format; and 4. Provide new insight into athlete status to optimize training and maximize performance. Slides
2:45 Discussion
3:00 to 3:15 Break with Refreshments
Session 4: Infrastructure, Information, and Intelligence
This talk describe different ways in which data science has helped solve cybersecurity problems and surveys the history of the relationship between cybersecurity and data science.
Every phone in Kyle Field is telling a unique story about its owner. But with 100,000 smart phones all “talking” at once, these stories are difficult to hear and even harder to understand. Between the Wi-Fi network, ticketing and point of sale systems we have access to billions of discrete data points at every event. These data hold the behavioral narratives of our fans. If data science can help decipher these stories, we can use new insights to improve our business. Slides
Not much is known about which Machine learning (ML) models are appropriate for which marketing problems, how the models perform, and what insights can be generated from them. In this paper, we review supervised, unsupervised, reinforcement, and deep learning models and their applications to marketing problems. We develop, estimate, and evaluate ML models using an omnichannel retail dataset. We compare prediction accuracy across the models and derive critical marketing and business insights.
5:10 Discussion
Conclusion
5:20 Nick Duffield (TAMIDS & ECE): Wrap up and next steps
5:30 Finish