1  Introduction

This week discusses data science as a field that cuts across disciplines and provides a historical perspective on the subject. We discuss the terms Data Science and Data Scientists, reflect on examples of Data Science projects, and discuss the research process at a methodological level. We will also use the examples as probes to think broadly on the potential influence of data-intensive scientific approaches on knowledge, industry and the wider society.

The practical lab session help students get acquainted with the analytical platform that will be used throughout the term and provides a first experience working with data sets within a data science approach.

1.1 Highlights of the lecture

This week we start by an introduction where we look at how the module is operating and discussing the basic objectives and definitions of the module.

Some of the key concepts you should remember from this week are …

  • the discussion about the terms Data Science and Data Scientists
  • the DS process and basic concepts from each step of the process
  • Importance of being critical and inquisitive in data science
  • different analyst types and skills

1.2 Practical Lab Session

This week is mainly a setup week where you get introduced to the coding environment and to Python.

At the end of the session, you should ..

  • have installed Anaconda and run it from your account
  • have tried out basic Python commands and reflect on how they operate
  • have loaded your first data file into Python and read the data in it

1.3 Reading lists & Resources

1.3.1 Required reading

  • On the origins of Data Science and Data Analysis (first 10 pages): Tukey, J.W., 1962. The future of data analysis. The annals of mathematical statistics, 33(1), pp.1-67. [pdf]
  • A formal look at data science: Dhar, V., 2013. Data science and prediction. Communications of the ACM, 56(12), pp.64-73. [library pdf link]
  • A systematic study of enterprise analysts, their daily tasks, and challenges: Kandel, Sean, et al. “Enterprise data analysis and visualization: An interview study.Visualization and Computer Graphics, IEEE Transactions on 18.12 (2012): 2917-2926.
  • On Google’s influenza epidemic application: Ginsberg, Jeremy, et al. “Detecting influenza epidemics using search engine query data.” Nature (2008) (need to search this through our library)
  • On the critique of the Google Flu Trend project: Lazer, D., Kennedy, R., King, G. and Vespignani, A., 2014. The parable of Google Flu: traps in big data analysis. Science, 343(6176), pp.1203-1205. [pdf]
  • An applied Data Science example: Quercia, D., Schifanella, R. and Aiello, L.M., 2014, September. The shortest path to happiness: Recommending beautiful, quiet, and happy routes in the city. In Proceedings of the 25th ACM conference on Hypertext and social media (pp. 116-125). [pdf]

1.3.2 Optional reading and resources

  • On the information pyramid: Ackoff, R.L., 1989. From data to wisdom. Journal of applied systems analysis, 16(1), pp.3-9. [a pdf link to a short extract]
  • A critique of the information pyramid: https://hbr.org/2010/02/data-is-to-info-as-info-is-not
  • The survey on analyst types and skills : Analyzing the Analyzers By Harlan Harris, Sean Murphy, Marck Vaisman
  • A public facing intro to Data Science : Data Science: A guide for society by Sense about Science - [pdf link]
  • On data biography: D’Ignazio, C., 2017. Creative data literacy: Bridging the gap between the data-haves and data-have nots. Information Design Journal, 23(1), pp.6-18. [pdf]