About the module

What this module is about?

This module introduces students to the fundamental techniques, concepts and contemporary discussions across the broad field of data science. With data and data related artefacts becoming ubiquitous in all aspects of social life, data science gains access to new sources of data, is taken up across an expanding range of research fields and disciplines, and increasingly engages with societal challenges. The module provides an advanced introduction to the theoretical and scientific frameworks of data science, and to the fundamental techniques for working with data using appropriate procedures, algorithms and visualisation. Students learn how to critically approach data and data-driven artefacts, and engage with and critically reflect on contemporary discussions around the practice of data science, its compatibility with different analytics frameworks and disciplinary, and its relation to on-going digital transformations of society. As well as lectures discussing the theoretical, scientific and ethical frameworks of data science, the module features coding labs and workshops that expose students to the practice of working effectively with data, algorithms, and analytical techniques, as well as providing a platform for reflective and critical discussions on data science practices, resulting data artefacts and how they can be interpreted, actioned and influence society.

What does this module aim to achieve?

In this module, students gain both formal knowledge and practical experience of the theoretical, scientific and ethical frameworks underpinning data science and critically reflect on the scope and impact of these frameworks. Lectures will provide a grounded understanding of the theoretical and scientific frameworks underpinning data science. In workshops, students gain experience of the fundamentals of the practice of data science, and through seminars they will be exposed to academic debates in data studies and related fields about the changing role of data science in society as seen in, for instance, the increasing use of data artefacts in policy and decision making in governmental bodies and businesses, how scientific discoveries are made and communicated, or how (in)equalities and power (im)balances are surfacing in uses of data. The module aims to build the required skills to apply data science techniques and algorithms within and across analytics frameworks developed in different disciplines. The module aims to cultivate a holistic data science practice which reviews the whole data science process critically and inquisitively, and handles problems through a user-centred thinking. This practice also embraces critical reflection about the data, algorithms, and data artefacts, as well as the ethical, societal, and cultural implications of data science broadly conceived.

Learning outcomes

  • Demonstrate an in-depth understanding of the theoretical underpinnings, scientific and ethical frameworks of data science as applied across disciplines
  • Demonstrate a critical understanding of the role that data and data intensive practices play in research, industry and the wider society
  • Demonstrate an understanding of the workings and the practicalities of the data science process
  • Apply and evaluate data science techniques and tools for particular scenarios and argue their suitability
  • Demonstrate an ability to critique any resulting data artefacts, such as data-informed decisions to data-driven models, including from a user-centred perspective
  • Develop and demonstrate an understanding of the societal, ethical, and cultural implications of advances in and applications of data science

Teaching timetable

  • Q & A sessions based on lecture content, weeks 1-5 & 7-8
    • Whole group Q & A at LIB2: Wednesdays 10am – 11am
  • Lab sessions Weeks 1-5 & 7-10
    • Lab group 1: Fridays 10am-12pm, in-person, at JX2.02
    • Lab Group 2: Fridays 2pm-4pm, in-person, at FAB5.03

FLIPPED CLASSROOM: IM939 broadly follows a flipped classroom model. This means that the in-person classes will be mostly used for discussions, Q&As, hands-on activities and for “doing” data science. There will be less time dedicated to traditional lectures during the in-person sessions. Instead, pre-recorded lecture videos will be provided as learning material.

Lecture materials such as slides and reading lists, and pre-recorded lecture videos that introduce the week’s content will be made available each week. You are expected to go through these material before the in-person sessions. We will use the Q&A sessions to go over any open questions and to reflect collectively on the week’s material. In most weeks, we will start with a brief recap of what is discussed in the pre-recorded lecture materials.

Assessment

The assessments will be individual based and will involve two components: a critical review and a data-driven essay. The critical review will involve students approaching a selected Data Science project through a critical lens covered during the lectures. The short report will expect students to engage with the related literature and reflect on the decisions made by the researchers of the project. Within the second component, the data-driven essay, students will report on a data science project that they carried on a chosen question and appropriate data set. The essay will be reporting on the data science process from initiation to evaluation to reflection while engaging with the relevant literature in the domain. These essays vary in length, depending on the number of CATS a student wishes to complete.

15-CATS 20 CATS 30 CATS
Critical Review (1000 words) – 40% (1250 words) – 40% (1500 words) – 40%
Final Essay (1500 words) – 60% (2000 words) – 60% (3000 words) – 60%

Illustrative Bibliography

  • Data science as a scientific practice : Dhar, Vasant. “Data science and prediction.” Communications of the ACM, 56.12 (2013): 64-73.

  • Iliadis, A. and Russo, F., 2016. Critical data studies: An introduction. Big Data & Society, 3(2), p.2053951716674238.

  • Ginsberg, Jeremy, et al. “Detecting influenza epidemics using search engine query data.” Nature (2008)

  • Kandel, Sean, et al. “Enterprise data analysis and visualization: An interview study.” Visualization and Computer Graphics, IEEE Transactions on 18.12 (2012): 2917-2926.

  • Osborne, Jason. “Notes on the use of data transformations.” Practical Assessment, Research & Evaluation 8.6 (2002): 1-8.

  • Osborne, Jason W., and Amy Overbay. “The power of outliers (and why researchers should always check for them).” Practical assessment, research & evaluation 9.6 (2004): 1- 12.

  • Guyon, Isabelle, and André Elisseeff. “An introduction to variable and feature selection.” The Journal of Machine Learning Research 3 (2003): 1157-1182.

  • Ringnér, Markus (2008). “What is principal component analysis?”. Nature biotechnology (1087-0156), 26 (3), p. 303.

  • Jaworska, Natalia, and Angelina Chupetlovska-Anastasova. “A review of multidimensional scaling (MDS) and its utility in various psychological domains.” Tutorials in Quantitative Methods for Psychology 5.1 (2009): 1-10.

  • Kohavi, A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection, Intl. Jnt. Conf. AI

  • White, Douglas R., and Stephen P. Borgatti. “Betweenness centrality measures for directed graphs.” Social Networks 16.4 (1994): 335-346.

  • Heer, Jeffrey, and Ben Shneiderman. “Interactive dynamics for visual analysis.” Queue 10.2 (2012): 30.

  • Ruckenstein, M. and Schüll, N.D., 2017. The datafication of health. Annual Review of Anthropology, 46, pp.261-278.

  • Pink, S., Ruckenstein, M., Willim, R. and Duque, M., 2018. Broken data: Conceptualising data in an emerging world. Big Data & Society, 5(1), p.2053951717753228.