Master in City & Technology 2021/22 – Term I
Seminar Name: Digital Tools & Big Data I – The basic tasks for big data analysis
Total Hours: 20 hours
Faculty: Diego Pajarito

Graph build using data from Google Trends

Syllabus

Data and computing power together have served to enhance methods and applications for advanced architecture. Data science and big data are two digital applications for the analysis of massive and non-traditional data sources. These methods and applications serve to define workflows and arrangement of basic programming pieces to deal with tasks such as data discovery and cleaning, descriptive statistics, visualisation or other data management tasks. The challenge is, therefore, to organise these tasks to deliver understandable outcomes. Since there is no one-fits-all strategy, data science is built on top of exploration and tests across big data tools. The goal of this course is to provide students with experience handling common tasks of big data, data science or data analytics. The course provides a practical perspective of the main activities developed for urban analytics. From data collection, ingestion, analysis and visualization, the students will experience the workflow while getting their hands on extracting information from massive datasets.
The course has seven sessions in which students directly interact with large data sets in practical sessions to develop the technical skills highly demanded in big data projects. The sessions start by discovering big data sources, performing descriptive analytics and plotting different data sets to identify trends and correlation. The course moves towards spatial and temporal dimensions of big data sets and the way to graphically represent features from these multiple dimensions. The last part of the course deals with data management tasks such as splitting, aggregating, merging and summarising datasets to improve analysis and visualization.
The course follows a practical methodology in which students study key concepts during the sessions and develop exercises to understand them deeply. Students have access to a GitHub repository with a compilation of source code and examples for the topics and tools used during the course. Along with the exercises, students will define a particular analysis scenario to apply the concepts seen in class. The analysis scenario might focus on a problem and a defined set of variables to analyze as well as tools to visually represent the results. Students will decide about analysis scenarios after the third session once they have explored multiple datasets and analysis tools.
The course aims to generate an environment for students to practice while starting to work on data analysis. Through hands-on sessions, students will gain confidence and develop skills for identifying different analysis techniques for complex problems and big data sets. The individual analysis scenario will serve for experimenting and applying the concepts seen in the course. Students will prepare an academic poster in which they summarise the problem, motivation, methods applied and conclusions to be exposed after the end of the course. The students will also deliver a documented repository with data, source code and visual outcomes generated for creating the poster. The course will provide the tools and fundamental concepts needed for the second and third term activities as well as for improving the analysis developed at the different studios from the Master in City and Technology

Faculty

Diego Pajarito got his PhD in Geoinformatics as part of a Marie Curie ITN Action – Joint doctorate between the Universities of Münster, Universitat Jaume I and Universidade Nova de Lisboa (2018), and the MSc in Information and Communication Sciences from Universidad Distrital de Bogotá (2014). He has performed research for data analytics and spatio-temporal analysis of sustainable development, smart cities and urban systems. Also, Diego has developed data collection techniques through mobile devices and crowdsourced data collection. Diego’s interests are the simplification of data collection and analysis for non-expert audiences when it comes to the analysis of spatial distributions. He has been a lecturer of courses on spatial analysis, big data and spatial databases in Spain, IAAC (2019-2020), Colombia, Universidad Distrital (2010-2015) and Universidad Autonoma de Bucaramanga (2014). He has also been a consultant for geospatial analysis and high-performance computing for different agencies in Colombia such as the Ministry of Agriculture (2010-2015), Institute of Environmental, meteorological affairs (2010, 2012, 2014), Ministry of Justice (2014), Geographical Institute (2007-2010), among others. Diego is a postdoctoral research fellow at the Institute for Global Sustainable Development at the University of Warwick and seminar faculty of IAAC’s MaCT program.