The tables below provide an indicative overview of courses that are usually offered during the first and the second year of the MSc in Data Science. Every academic year this list may be subjected to minor amendments. Final lists of courses provided each academic year can be found in the Course Offer.

Further information are also available on the Student handbook.

First year

Curriculm A - Mandatory courses

Course Credits (ECTS)

Foundation of Social and Psychological Science: ICT and Social Science theories and models and ICT and cognitive psychology theories and models

Data science and Sociology: methods and applications.

This module will introduce the main analytical and methodological dimensions of sociological research and will place data science approaches and methods of analysis in relation to them. At the end of the module, students will be able to:

  • read and understand sociological research by identifying the research questions, the characteristics, the potential and the limits of the methodological choices made
  • develop a critical awareness of the nature and uses of Big Data within sociological research
  • formulate relevant research questions and design research plans that are consistent with them and that exploit the potential of Data Science in a conscious and appropriate manner
  • present and communicate the results to a wider audience of potential stakeholders.

Data science and Psychology: Methods and applications

This course provides a first introduction to the most relevant constructs and theoretical models in the psychological sciences by means of a descriptive framework which highlights important connections between psychological theories and some of the most recent advances in Data Science. The main idea consists in showing how some traditional psychology research fields such as, for example, work and organizational psychology, applied cognitive psychology, and can benefit from an approach based on the use of big data. In this course, the students will learn how to recognize and appreciate the specific characteristics and features of the psychological datum as well as to manage the data analysis problem via a hierarchical and multiphasic process to extract useful and relevant information from specific empirical hypotheses.

12

Data Mining

This is a graduate level course that studies mathematical models, computational paradigms, algorithms and methodologies that look for patterns and regularities in large amounts of raw data, in order to understand natural phenomena, business operations and human behaviors, and make predictions, forecastings and performance improvements. The goal of the course is to introduce the students to the basic concepts, principles and techniques of Data Mining, to help them develop the required skills for using the state of the art data mining algorithms for solving practical problems, and to provide them with the required experience that will later allow them to operate independently, efficiently and effectively, in highly competitive markets. The course aims also to support students wishing to pursue a research carrier by teaching them the methodologies in performing independent and effective studies and research activities.At the end of the course, the students will be finding themselves familiar with the most popular data mining concepts and will be able to identify and use the right solutions for any data analysis problem they may find themselves into. Last but not least, future data scientists will also learn how to perform the right experiments, how to interpret the results correctly and how to present them in the most effective way. At the end of the course, the student will be able to manage Data Mining definitions and difference from Machine Learning & AI, Similarity Techniques, Clustering, Association Rules, Frequent Itemsets, Clustering, Recommendation Systems, Online Advertising, Classification, Dimensionality Reduction, Graph Processing, and Visualization.

6

Big Data Technologies

The students will learn how to leverage Big Data frameworks, configure them, know what is needed in order to use them, and be clear on the benefits to expect from them. The knowledge acquired is done in two fields. The first is the processing (by introducing new programming and data processing approaches), and the second is the storage and querying (by presenting new systems designed for such data). At the end of the course, the students will be able to face real world challenges by having the ability to identify the right solutions in real life situations involving Big Data, make the right choices in putting in place, configuring, and using big data systems, and perform the required maintenance and optimization tasks. The course is fundamental for the modern data scientists since it provides them with required knowledge on the tools that are available for achieving their goals. In particular will have knowledge on:

  1. Introduction to Big Data, Relational Model Principles
  2. Big Data Management: Map Reduce, Unix, Virtual Machines, HDFS, Hadoop, HIVE, PIG, Spark (SQL, DataFrame, MLIB, GraphX)
  3. NoSQL: MongoDB, CouchDB (Document Databases), Neo4J (Graph Databases), Oracle NoSQL / Berkeley DB, Riak (Key-Value Stores), HBase (Family Column stores), PostgresQL (Relational extensions).
6

Professional English for Data Science

The objective of the English Test at B2 level is to measure a candidate's linguistic competences in listening, reading, writing and speaking according to the B2 competence descriptors in the CEFR (The Common European Framework of Reference for Languages). The tests require candidates to demonstrate their competence in English in a range of academic, personal and professional contexts. The result of the test is PASS or FAIL. The result conveys whether or not a candidate would be able to cope with a reasonable degree of autonomy in an English-medium academic / familiar professional context.

3

Statistical Learning: Statistical Methods and Statistical Models

Module Statistical Methods: the student will learn the principles and practice of statistical inference, with a focus on the likelihood-based approach and the linear regression model. In particular, after a brief review of the basic principles of probability and random variables, the first part of the course will allow students:

  • to develop a deep understanding of the concept of likelihood function and its characteristics
  • to perform maximum likelihood estimation
  • to perform hypothesis testing and construct confidence intervals through the likelihood ratio method and its variants.

The second part of the course will develop the knowledge of the linear regression modeling framework and the ability to apply it in different practical contexts. Therefore, at the end of the course, the student should be able:

  • to specify and estimate a linear regression model according to the empirical situation under study
  • to perform hypothesis testing to compare models and construct confidence intervals for model parameters and for predictions
  • to detect and deal with the main violations of model assumptions: multicollinearity, heteroscedasticity and correlated errors.

Module Statistical Models: after successful completion of the module students are able to understand and apply the basic notions, concepts, and methods of computational linear algebra, convex optimization and statistical multivariate methods for data analysis and dimension reduction problems. They master generalized linear models for the analysis of discrete variables, etc. and the use of the singular value decomposition, principal components analysis and random matrices for low dimensional data representations. They know techniques such as Linear and Quadratic Discriminant Analysis, Multidimensional Scaling, Factor and Correspondence Analysis. They know basics of sparse recovery problems, including compressed sensing, low rank matrix recovery, and dictionary learning algorithms.

12

Law and Data

The course aims to introduce students to the study of the different legal issues related to data management. Basics for understanding the legal aspects will therefore be provided initially. Particular attention will be paid to the phenomena that go under the names of "Open Data" and "Big Data", followed by the study of intellectual property rights (copyright, sui generis right on databases, etc.) and the contractual instruments that allow their circulation (licences). Finally, the focus will be on data protection rules, with particular attention to the management of research data.

6

Information, Knowledge and Service Management

The course intends to bring the students a solid general understanding of data, information and knowledge management focusing on concepts, methods and tools that can be used to enable innovation and change management within the organizations. A particular focus will be proposed on service science, an interdisciplinary approach to the study, design, and implementation of services systems. Organizations are complex systems in which specific interdependencies of people and technologies take actions and provide value. The course enable the student to deeply study theories, methods and techniques of Information and Knowledge management, Information system and data management, open data, double web platforms, innovative business models based on open innovation and some emerging technological phenomena such as crowdsourcing and gamification. At the end of the course, students will be able to identify the processes of creating and managing information and business knowledge. They will be able to understand how to manage information and knowledge processes that influence change management. Finally, pupils will be able to use gamification mechanics to promote effective management of information, knowledge and the behaviors of actors involved.

6

Data Visualization Lab

The course aims at providing a basic introduction to the concepts and the tools for data exploration and visualization, through class lectures and lab sessions. The core of the class is the exploration of the theoretical foundations and the practice of the diverse dimensionality reduction strategies, from the basic procedures to the more advanced state-of-the-art algorithms. Further, basics of clustering theory will also be shown. These topics will be complemented by a discussion on the principles of data visualization through the different types of graphics. At the end of the course, students will be able to: ● describe the overall structure of a multidimensional dataset; ● effectively project a multidimensional dataset in a lower dimensional space highlighting the main features; ● choose a suitable graphical representation to pinpoint one or more quantitative aspects of the dataset; ● write the code to implement the chosen graph into one of the languages/environments shown during class.

6

Introduction to Machine Learning

The course aims to give a broad introduction to Machine Learning, in particular under an applicative perspective. The classes are evenly divided in theoretical and practical, where in the first part the student will be driven through a few possible machine learning approaches meant to face different applicative tasks, the second part instead will require a more "hands on" approach, from the data acquisition, to the training process management and the usage of tools of machine learning. During the course, the students will learn to analyze the approach to a Machine Learning project in its entireness. In the end, the students will be able to assess the type of algorithm to use (supervised, unsupervised, few-shot, etc.), interpreting results and learning to work for objectives. The lab classes will be done in Python, using open-source toolboxes. The topics of the final project will be on general machine learning.

6

 

Curriculm B - Mandatory courses

Course Credits (ECTS)

Scientific Programming: Programming and Algorithms and Data Structure

Programming module: the goal of the course is to introduce the Python programming language, one of the most widely used scientific computing languages, and related technologies. At the end of this course, the students will be able to: a. remember the syntax and semantics of the Python language; b. understand programs written by others individuals; c. analyze a simple data analysis task and reformulate it as a programming problem; d. evaluate which features of the language (and related scientific libraries) can be used to solve the task; e. construct a Python program that appropriately solves the task; f. evaluate the results of the program.

Algorithms and Data Structure module: the overall goal of this course is to introduce students to the design and analysis of algorithmic solutions, through the presentation of the most important class of algorithms and the evaluation of their performance. At the end of the course, students will be able to: a. describe classic algorithms and understand their behavior; b. understand, at the basic level, the most important algorithm design techniques; c. evaluate algorithmic choices and select the ones that best suit their problems; d. analyze the complexity of algorithms; e. design simple algorithmic solutions to basic problems, and to implement them using the Python language.

12

Linear algebra for statistics

The course aims at providing with a basic working knowledge of linear algebra, and of elementary calculus. After successfully attending the course, the students will be able to:

  • understand the basic concept of linear algebra and elementary calculus
  • compute with agility with vectors and matrices
  • compute with agility with the derivatives of simple functions
  • understand the concept of eigenvalues and eigenvectors
  • compute eigenvalues and eigenvectors in simple examples.
6

Big data Technologies

The students will learn how to leverage Big Data frameworks, configure them, know what is needed in order to use them, and be clear on the benefits to expect from them. The knowledge acquired is done in two fields. The first is the processing (by introducing new programming and data processing approaches), and the second is the storage and querying (by presenting new systems designed for such data). At the end of the course, the students will be able to face real world challenges by having the ability to identify the right solutions in real life situations involving Big Data, make the right choices in putting in place, configuring, and using big data systems, and perform the required maintenance and optimization tasks. The course is fundamental for the modern data scientists since it provides them with required knowledge on the tools that are available for achieving their goals. In particular will have knowledge on: 1) Introduction to Big Data, Relational Model Principles, 2) Big Data Management: Map Reduce, Unix, Virtual Machines, HDFS, Hadoop, HIVE, PIG, Spark (SQL, DataFrame, MLIB, GraphX) 3) NoSQL: MongoDB, CouchDB (Document Databases), Neo4J (Graph Databases), Oracle NoSQL / Berkeley DB, Riak (Key-Value Stores), HBase (Family Column stores), PostgresQL (Relational extensions)

6

Professional English for Data Science

The objective of the English Test at B2 level is to measure a candidate's linguistic competences in listening, reading, writing and speaking according to the B2 competence descriptors in the CEFR (The Common European Framework of Reference for Languages). The tests require candidates to demonstrate their competence in English in a range of academic, personal and professional contexts. The result of the test is PASS or FAIL. The result conveys whether or not a candidate would be able to cope with a reasonable degree of autonomy in an English-medium academic / familiar professional context.

3

Statistical Learning: Statistical Methods and Statistical Models

Module: “Statistical Methods” - The student will learn the principles and practice of statistical inference, with a focus on the likelihood-based approach and the linear regression model. In particular, after a brief review of the basic principles of probability and random variables, the first part of the course will allow students ● to develop a deep understanding of the concept of likelihood function and its characteristics; ● to perform maximum likelihood estimation; ● to perform hypothesis testing and construct confidence intervals through the likelihood ratio method and its variants. The second part of the course will develop the knowledge of the linear regression modeling framework and the ability to apply it in different practical contexts. Therefore, at the end of the course, the student should be able: ● to specify and estimate a linear regression model according to the empirical situation under study; ● to perform hypothesis testing to compare models and construct confidence intervals for model parameters and for predictions; ● to detect and deal with the main violations of model assumptions: multicollinearity, heteroscedasticity and correlated errors.

Module “Statistical Models” - After successful completion of the module students are able to understand and apply the basic notions, concepts, and methods of computational linear algebra, convex optimization and statistical multivariate methods for data analysis and dimension reduction problems. They master generalized linear models for the analysis of discrete variables, etc. and the use of the singular value decomposition, principal components analysis and random matrices for low dimensional data representations. They know techniques such as Linear and Quadratic Discriminant Analysis, Multidimensional Scaling, Factor and Correspondence Analysis. They know basics of sparse recovery problems, including compressed sensing, low rank matrix recovery, and dictionary learning algorithms.

12

Computational social science

The module aims at providing an understanding of the main computational research methods that are specific to online media data and to analyse social processes with an emphasis to 'big data' sources. The module presents an overview of current cutting-edge methodology in quantitative methods related to online social research: Web surveys, online experiments, opinion mining techniques, social network analysis, computational statistical models At the end of the module, students will be able to: ● understanding the main principles at the core of the different computational methods that are applied to social science datasets; ● a firm grasp of the coding process using software for automatic text analysis; ● apply social network analysis to the context of combining network information with other type of data; ● understanding what is opinion mining and what are its core principles and techniques; ● use computational models to analyse large survey data through techniques such as model based recursive partitioning, latent class analysis, relational class analysis.
6

Law and Data

The course aims to introduce students to the study of the different legal issues related to data management. Basics for understanding the legal aspects will therefore be provided initially. Particular attention will be paid to the phenomena that go under the names of "Open Data" and "Big Data", followed by the study of intellectual property rights (copyright, sui generis right on databases, etc.) and the contractual instruments that allow their circulation (licences). Finally, the focus will be on data protection rules, with particular attention to the management of research data.

6

Data visualization Lab

The course aims at providing a basic introduction to the concepts and the tools for data exploration and visualization, through class lectures and lab sessions. The core of the class is the exploration of the theoretical foundations and the practice of the diverse dimensionality reduction strategies, from the basic procedures to the more advanced state-of-the-art algorithms. Further, basics of clustering theory will also be shown. These topics will be complemented by a discussion on the principles of data visualization through the different types of graphics. At the end of the course, students will be able to: ● describe the overall structure of a multidimensional dataset; ● effectively project a multidimensional dataset in a lower dimensional space highlighting the main features; ● choose a suitable graphical representation to pinpoint one or more quantitative aspects of the dataset; ● write the code to implement the chosen graph into one of the languages/environments shown during class.
6

Introduction to Machine Learning

The course aims to give a broad introduction to Machine Learning, in particular under an applicative perspective. The classes are evenly divided in theoretical and practical, where in the first part the student will be driven through a few possible machine learning approaches meant to face different applicative tasks, the second part instead will require a more "hands on" approach, from the data acquisition, to the training process management and the usage of tools of machine learning. During the course, the students will learn to analyze the approach to a Machine Learning project in its entireness. In the end, the students will be able to assess the type of algorithm to use (supervised, unsupervised, few-shot, etc.), interpreting results and learning to work for objectives. The lab classes will be done in Python, using open-source toolboxes. The topics of the final project will be on general machine learning.

6

 

Second year - Curriculum A and B

Elective and free choice courses

During the second year, students choose 18 ECTS of elective courses and 12 ECTS of free choice courses among laboratories and theoretical courses listed in the document Course offer ('Manifesto degli studi') for each academic year. In the Course offer are also stated the rules for selecting these courses.

Mandatory courses

Course Credits (ECTS)

Internship

9

Final exam

18
Aggiornato il
19 June 2023