
Text Mining Workshop Series
The Text Mining Workshop Series is a four-part, hands-on tutorial program designed to introduce social science researchers to computational text analysis, from foundational concepts to advanced modeling techniques. Led by TAMIDS’s Data Science Ambassadors, the series begins with an accessible introduction to text mining, its complement to traditional qualitative methods, and its value for social science research.
Participants will learn the full text-mining pipeline, core concepts such as corpora and document-term matrices, and essential preprocessing techniques such as tokenization, stopword removal, and lemmatization. Each session combines theory with practical Python exercises in Jupyter Notebook, ensuring participants gain both conceptual understanding and technical skills.
FOUNDATIONS & TEXT PREPROCESSING
Workshop One of Four
Friday, February 27 | 10 AM – 12 PM | Rudder Tower 601
Participants will understand what text mining is and why it’s valuable for social science research. They’ll learn how to prepare text data for analysis and perform basic text cleaning.
Key Topics:
- Text mining applications in social sciences
- Text mining vs. traditional qualitative methods
- The text mining pipeline
- Core concepts: corpus, document, token, document-term matrix
- Preprocessing techniques: tokenization, lowercasing, punctuation removal, stopword removal, lemmatization/stemming
- Creating word frequency distributions and visualizations
NOTE: Please install Jupyter Notebook ahead of the workshop. If you would like help with Python, you may review the free PYTHON PRIMER on the Texas A&M Institute of Data Science’s website. Instructors will be available 30 minutes before the workshop starts to help those who are struggling with the installation.
EXPLORATORY ANALYSIS & VISUALIZATION
Workshop Two of Four
Friday, March 27 | 10 AM – 12 PM | Rudder Tower 601
Participants will learn techniques for exploring and describing textual datasets, identifying patterns, and communicating findings through visualization. It will be one hour of theory and explanation of the concepts, and the last hour will be hands-on practice in Python.
Key Topics:
- Word frequency analysis and comparative frequency
- N-grams (bigrams and trigrams) for phrase detection
- TF-IDF (Term Frequency-Inverse Document Frequency) for identifying distinctive terms
- Text visualization best practices and techniques
- Interpreting exploratory findings in a social science context
SUPERVISED TEXT CLASSIFICATION
Workshop Three of Four
Friday, April 10| 10 AM – 12 PM | Rudder Tower 701
Participants will learn how to train models to categorize texts based on labeled examples and understand evaluation metrics. It will be one hour of theory and explanation of the concepts, and the last hour will be hands-on practice in Python.
Key Topics:
- Supervised learning paradigm and workflow
- Applications in social science research (sentiment analysis, content categorization, frame detection)
- Training data preparation and train-test splitting
- Feature extraction with TF-IDF and bag-of-words
- Classification algorithms: Naive Bayes and Logistic Regression· Evaluation metrics: accuracy, precision, recall, F1-score, confusion matrices
TOPIC MODELING & DISCOVERY
Workshop Four of Four
Friday, April 24 | 10 AM – 12 PM | Rudder Tower 510
Participants will learn how to identify latent themes in large text collections using unsupervised machine learning and how to interpret and validate these findings. It will be one hour of theory and explanation of the concepts, and the last hour will be hands-on practice in Python.
Key Topics:
- Unsupervised learning and topic modeling concepts
- Latent Dirichlet Allocation (LDA) intuition and implementation
- Choosing the number of topics (K) using coherence metrics and interpretability
- Systematic topic interpretation: words, documents, and labels
- Evaluating topic quality and distinctiveness· Document-level and corpus-level topic analysis recall, F1-score, and confusion matrices
Workshop Lead

Walid El Mansour
Senior Domain Data Science Ambassador
Department of Educational Administration & Human Resource Development
welmansour@tamu.edu



