Projects

A multi-modal, multi-diagnostic approach to language-based inference of mental disorders

Together with Lasse Hansen and Riccardo Fusaroli from Aarhus University, I am working on developing text-, audio-, and combined text-and-audio models for language-based inference of psychiatric disorders in a multimodal and multiclass settings.
We have engineered a number of baseline models (using XGBoost on text and audio features) as well as transformers-based architectures, and trained them to predict clinical diagnoses for a cohort of individuals diagnosed with ASD, schizophrenia or major depressive disorders and matched controls. In our forthcoming manuscript, we show that performance in multiclass settings decreases significantly compared to binary (diagnosis vs. control) prediction problems, highlighting the need for more research (and larger datasets!) aimed at improving the specificity and the real-world clinical utility of language- and voice-based diagnostic approaches.
We also show that ensemble approaches (text + audio) can improve specificity in multiclass settings, efficiently leveraging information from multiple modalities at a low computational cost.
NLP psychiatry clinical diagnostics machine learning language

Neuroscout: a platform for large-scale naturalistic fMRI research

At Psychoinformatics Lab, I have contributing to the development of Neuroscout, an end-to-end platform for the analysis of naturalistic fMRI data. You can read more about Neuroscout in our eLife paper: https://elifesciences.org/articles/79277. I am focusing on expanding Neuroscout’s annotation set by implementing feature extraction pipelines that use pretrained deep learning models (e.g., from HuggingFace’s transformers and TensorflowHub) in pliers.
I contributed to validating the platform and showing its potential to increase the generalizability of neuroimaging findings through a series of large-scale meta-analyses presented in the paper, and available as a Jupyter book here.
neuroimaging research methods machine learning open-source

Text transformer for context-aware encoding

This project focuses on training transformer encoders whose representations incorporate information about higher-order context, i.e., characteristics of the author and/or the pragmatic context. We feed models a target sequence and a number of ‘context’ sequences (i.e., text from the same author, or from the same subreddit) as a single example, and train models on a variant of MLM where the MLM head is fed the combination of token-level representations of the input sequence and an aggregate representation of context sequences.

We experiment with three DistilBERT-inspired architectures: a bi-encoder (where context and target are fed to two separate encoders), a ‘batch’ encoder (single encoder with added context aggregation and target-context combination layers) and a hierarchical encoder (applying attention across [CLS] tokens in between standard transformer layers to integrate information across contexts and target sequence). The benefits of this training protocol are evaluated both by comparing their MLM performance with no-context MLM training and to random-context training, as well as on extrinsic tasks.

This project is still in progress.

NLP transformers DistilBERT TensorFlow huggingface ML

Understanding cognitive dimensions of political identity using NLP

I am currently part of an interdisciplinary consortium working on investigating factors shaping political identity at the national and transnational level. As a postdoc at the Interacting Minds Centre at Aarhus University, I am currently working on understanding which cognitive representations underlie people’s attachment to national and transnational institutions using multilingual corpora of social media data and open-text survey data.
applied NLP computational social science data science ML

Cognitive diversity promotes collective creativity: an agent-based simulation

In this project (a collaboration with Kristian Tylén from Aarhus University), we use agent-based simulations to investigate: a) whether and how performing a divergent thinking task with others makes us come up with better and more creative solutions; b) how this is modulated by cognitive diversity within the group.
We have published a paper on the 2022 CogSci proceedings, which is available here: https://escholarship.org/uc/item/58v5d82w Code is publicly available here and follow-ups are in progress.
agent-based modeling social cognition NLP creativity

Introducing benchmarks for the evaluation of psychological models

Quantitative research in psychology and neighboring field emphasizes explanation and in-sample effect sizes over demonstrating models’ ability to predict on unseen data (generalization).
In a methods paper that interleaves theoretical arguments with empirical demonstrations (code available in this repo), we show how psychology would benefit from adopting benchmarking as a consensus paradigm for model evaluation.
We discuss how psychology can learn from both the strengths and the known weaknesses (e.g., biases, overfitting) of benchmarking in ML, discuss first steps for introducing these new practices in the field, and outline their potential to increase the practical utility of the outputs of psychological research.
This article has been published in Advances in Methods and Practices in Psychological Sciences, and it available at: https://journals.sagepub.com/doi/full/10.1177/25152459211026864
research methods evaluation machine learning

Complex systems modeling for humanitarian action: methods and opportunities

In Summer 2021, I worked at the United Nations’ Centre for Humanitarian Data as a Predictive Analytics Data Fellow. During my fellowship, I conducted research on how complex systems modeling (e.g., agent-based simulations, systems dynamics, network models) can be used to understand humanitarian crises, monitor their unfolding and simulate the effectiveness of response plans ahead of implementation.
My research involved methods reviews, interviews with humanitarian actors, preliminary data analysis, and a final report which introduces complex systems modeling for humanitarians and outlines recommendations on contexts and technical requirements for a pilot. The report is available at this link, and a blog post summarizing it is available here.
complex systems predictive modeling humanitarian data social good

The neural underpinnings of spatial demonstratives

Spatial demonstratives are words like ‘this’ and ‘that’ used to direct manipulate people’s attentional focus. They are extremely frequent, yet far from simple. Understanding what they refer to requires not only knowing language, but also the context in which they are pronounced.
As part of my PhD, I ran a naturalistic fMRI study combining synthesized dialogical narratives, fast multiband acquisition, and finite impulse response modeling to understand how the brain makes sense of them.
I found that spatial words engaged dorsal regions of the brain implicated not only in language, but in various aspects of visuospatial cognition, supporting distributed views of language processing.
This study has been published in NeuroImage, and it is available here https://www.sciencedirect.com/science/article/pii/S1053811919307190
neuroimaging language spatial cognition research methods

Investigating social modulations of spatial representations through language

Humans perceive space as functional to action. Several studies have shown that humans organize space into a peripersonal (i.e., within reach) and an extrapersonal (i.e., outside reach) region.
Interestingly, a lot of the actions we perform in our daily life are performed together with others. Do we adapt the way in which we parse space as near vs. far oto the position of other people when action goals are shared?
We tested this hypothesis over two interactive experiments using language as a proxy for spatial representations. We found that, in the context of joint action, linguistic coding of locations as proximal vs. distal is based on the position of the partner rather than oneself’s.
These studies (part of my PhD) are published in Nature Scientific Reports. The article is available here: https://www.nature.com/articles/s41598-019-51134-8
social cognition language spatial cognition research methods

The semantics of spatial demonstratives

Spatial demonstratives (words like ‘this’ and ‘that’) are thought to map onto a distinction between near and far space. Yet, when people are asked to pair a noun with a demonstrative without any spatial context, choices tend to be non-random. Over a number of large-scale online experiments, I investigated which semantic features of a referent determine which demonstrative people tend to use to refer to it.
Using PCA and multilevel linear modeling, we found that demonstrative choice is systematically influenced by a range of factors including manipulability, valence, and potential for motion. Importantly, the resulting experimental paradigm (the ‘demonstrative choice task’) has been used across a number of languages displaying consistent results, and it is currently being used in follow-up studies to investigate whether linguistic behavior in the demonstrative choice task can be used as predictor of personality and clinical traits.
Studies from my PhD using this paradigm have been published in PlosOne, Frontiers in Psychology, and Language and Cognition, and two more are currently in progress.
spatial cognition language research methods