Santorini AI’sland
ABSTRACT
Santorini is one of the Cyclades islands in the Aegean Sea, located 200 km southeast from the Greek mainland. It was the site of one of the largest volcanic eruptions in recorded history in the 16th century BC, forever shaping its rugged landscape. Since the 1970s it became famous all over the world for its unique landscape and its distinctive architecture and is rapidly developing into tourism to be today one of the top travel destinations in the world.
In order to best accommodate the features that define the price range and the star ratings of the very large number of hotels existing on the island we will use machine learning algorithms in order to analyze and understand the correlations between different features. Furthermore using regression models we will predict complex operations, such as the rental prices.
01. Project description
The main characteristics for identifying the building performance metrics are building size & typology, location, amenities and view score, solar irradiation and accessibility
02. Data generation
For our data generation we started gathering data from Airbnb, booking.com and other booking website. Furthermore we took the descriptive data together with the coordinates into grasshopper to generate geometric data and analytics. Lastly we decided to use noise augmentation in order to increase the dataset.
In our grasshopper file we recreated each scenario as closely resembling the real situation as possible but due to limited data in OSM of the buildings we had to take some assumptions to stick to the real data as much as possible.
03. Analyzing inputs
04. PCA & FAMD Analysis
The reasons why we proceeded with PCA (Principal Component Analysis) and FAMD (Factor Analysis of Mixed Data) are the following:
?find features that are able to differentiate data-points from each
other
? have ground truth signals, avoid ambiguity
? dimensionality reduction: find ways to express the same amount
of information with fewer, but more information-rich variables
(features)
? find patterns in the features of the data
? visualization of high-dimensional data
? pre-processing before supervised ML tasks (complexity & noise
reduction)
? avoid over & underfitting in our model
05. Linear Regression
As you can see here this is our first attempt in running a prediction for the price range. Training a shallow learning model with just 3 dense layers.the model is clearly underfitting due to a large margin of loss between the train and validation curve. This means the Model cannot learn from the training data. A possible reason behind this is the lack of datasets.
06. Data augmentation & comparisons
For our Data Augmentation what we did is duplicate iteration and we introduced random noise to the numerical data which is equivalent to 5% of the original value.
In the first row training on the augmented data with 2000 samples with the same model architecture, it is clear there is a significant improvement in the model.
It is now able to learn from the test data and we also have managed to minimize the loss function .
Improving it further, we trained again but now using 4500 samples and it has managed to improve its mean squared error but now it is overfitting and the Training loss is now lower than the validation loss.
To improve model we introduced a dropout to the layer to minimize ovefitting. We also ran the model with 400 epochs to see the possibility of convergence and flattening of of the curves.
07. Conclusions
Data augmentation improves the performance and outcome of the ML model. Data augmentation allows you to artificially expand data sets to solve this scarcity issue by leveraging the limited, labeled dataset you already have.
But it is also dependent upon how rich the base dataset is in explaining the diverse situations within the problem.
Adding slight variations (in our case noise of 5% from the original) improves the generalizability of the model. Data augmentation reduces risk of overfitting.
Santorini AI’sland is a project of IAAC, Institute for Advanced Architecture of Catalonia developed in the Master of Advanced Computation in Architecture and Design 2021/22 by Students: Neil John Bersabe and Maria Papadimitraki, Faculty: Gabriella Rossi, Faculty assistant: Hesham Shawqi