Ruben L. Bach

Statistics | Data Science | Survey Methodology

Hi – I’m Ruben, postdoctoral researcher at the University of Mannheim (see also here) in social science quantitative research methods. I’m especially interested in all topics related to big data in the social sciences, machine learning, causal inference and survey research.

About me

Prior to joining hte University of Mannheim, I spent three years in the Institute for Employment Research (IAB)’s graduate program where I wrote my dissertation on behavioral consequences of repeated measurements in social science surveys using methods of causal inference and machine learning. Recently, my dissertation was awarded the Lorenz-von-Stein Award of the Mannheim Centre for European Social Research (MZES) at the University of Mannheim.

If you want to know more about me, check out my CV! If you prefer an academic-style version of my CV (including those long and boring lists like conference presentations, journals I reviewed for, workshops I organized etc.), look here.

My work

My current research projects involve using digital trace data (online and mobile web activity) and social media data (Reddit) as well as machine learning and natural language processing techniques for social research.

In this field, I recently published findings on the accuracy of predicted personal sensitive information (voting behavior and political preferences) in Social Science Computer Review (open access). You can find a pdf of the paper here.

A major concern arising from ubiquitous tracking of individuals’ online activity is that algorithms may be trained to predict personal sensitive information, even for users who do not wish to reveal such information. Although previous research has shown that digital trace data can accurately predict sociodemographic characteristics, little is known about the potentials of such data to predict sensitive outcomes. Against this background, we investigate in this article whether we can accurately predict voting behavior, which is considered personal sensitive information in Germany and subject to strict privacy regulations. Using records of web browsing and mobile device usage of about 2,000 online users eligible to vote in the 2017 German federal election combined with survey data from the same individuals, we find that online activities do not predict (self-reported) voting well in this population. These findings add to the debate about users’ limited control over (inaccurate) personal information flows.

If you want to work with Reddit data, I highly recommend that you check out this article!

Furthermore, I have recently started to study how automated decision making and AI-based systems may create new or foster existing social inequalities and discrimination!

Besides my work on new data and new methods for social research, I continue to work on projects that extend and build on the research I did for my dissertation. If you’re interested, check out these two papers (Rotation Group Bias in the Consumer Expenditure Survey and Misreporting on Mobile Devices).

If you’re interested in my dissertation work on behavioral changes due to repeated survey participation, check my work in the Journal of the Royal Statistical Society, Series A: Statistics in Society (JRSS-A) (pdf). A second paper on changes in reporting over the waves of a panel survey (panel conditioning) is avilable in Journal of Survey Statistics and Methodology (JSSAM). The final empirical chapter from my dissertation (connection between response propensities and misreporting in surveys) also appeared in JSSAM (link).

Get in touch!

If you want to know more about me, my work or if you are interested in collaborating,  send me an email or contact me on LinkedIn or Twitter.