Using DataRobot to predict the price of million-dollar HDB resale flats

Using DataRobot to predict the price of million-dollar HDB resale flats

Domo arigato, Mr. Roboto

Introduction

"Five-room HDB unit at Ang Mo Kio sold for $1.01 mil!"

"Five-room HDB DBSS flat in Bishan sold for record S$1.295m!"

These are the breaking news commonly reported in the Singapore media today.

Are million-dollar HDB resale flats the norm in future or is this simply a case of journalism sensationalism?

What if we could use machine learning algorithms to predict the future price of HDB resale flats?

Context

I had the opportunity to fiddle with the DataRobot Enterprise AI Platform recently.

The first thought that came into my head was to see if it could predict Singapore Pools 4d Lottery winning numbers.

Sensibility prevailed.

I opted for the next best thing instead. HDB resale flat prices.

What is DataRobot?

DataRobot touts itself as an "enterprise AI platform that accelerates and democratizes data science by automating the end-to-end journey from data to value. This allows you to deploy trusted AI applications at scale within your organization. DataRobot provides a centrally governed platform that gives you the power of AI to drive better business outcomes and is available on your cloud platform-of-choice, on-premise, or as a fully-managed service".

Well...

Gentlemen, you had my curiosity ... but now you have my attention.

Data preparation

Obtaining the dataset required minimal effort. Data.gov.sg has a huge dataset of HDB resale transacted prices from January 1, 1990, to September 9, 2021, at the time of writing.

Smart Nation for the win!

RangeIndex: 853645 entries, 0 to 853644
Data columns (total 11 columns):
 ##   Column               Non-Null Count   Dtype  
---  ------               --------------   -----  
 0   month                853645 non-null  object 
 1   town                 853645 non-null  object 
 2   flat_type            853645 non-null  object 
 3   block                853645 non-null  object 
 4   street_name          853645 non-null  object 
 5   storey_range         853645 non-null  object 
 6   floor_area_sqm       853645 non-null  float64
 7   flat_model           853645 non-null  object 
 8   lease_commence_date  853645 non-null  int64  
 9   resale_price         853645 non-null  float64
 10  remaining_lease      144595 non-null  object 
month town flat_type block street_name storey_range floor_area_sqm flat_model lease_commence_date resale_price remaining_lease
2012 - 03 ANG MO KIO 2 ROOM 172 ANG MO KIO AVE 4 06 TO 10 45.0 Improved 1986 250000.0 NaN
2012 - 03 ANG MO KIO 3 ROOM 610 ANG MO KIO AVE 4 06 TO 10 68.0 New Generation 1980 150000.0 NaN
...
A sample dataset

There's not much left to do other than:

  • Transform the month values to a suitable DateTime format for time series forecasting and training.
  • Calculate the remaining lease for each row.
  • Remove the records for 2021. We will be using these records for prediction testing later.

Given that all HDB flats in Singapore have a maximum lease of 99 years, we can derive the remaining lease by deducting 99 from the current year minus the lease commence date.

>>> df['datetime'] = pd.to_datetime(df['month'], format="%Y-%m") ## Convert to DateTime
>>> df['remaining_lease'] = 99 - (2021 - df['lease_commence_date']) ## Calculate remaining lease
>>> df[['year','month']] = df.month.str.split("-", expand=True) ## Split year and month
>>> df = df[df.year != "2021"] ## Remove year 2021 records from DataFrame
>>> df.drop(['month', 'year'], inplace=True, axis=1) ## Drop the redundant month and year columns
datetime town flat_type block street_name storey_range floor_area_sqm flat_model lease_commence_date resale_price remaining_lease
2012 - 03 - 01 ANG MO KIO 2 ROOM 172 ANG MO KIO AVE 4 06 TO 10 45.0 Improved 1986 250000.0 64
2012 - 03 - 01 ANG MO KIO 3 ROOM 610 ANG MO KIO AVE 4 06 TO 10 68.0 New Generation 1980 150000.0 58
...
Much better!

Download the full dataset here

Prediction time

It's relatively straightforward from here on.

We feed the data to DataRobot and let its algorithm handle the rest.

Data upload

Automated backtesting. Nice!

Automated backtesting. Nice!

Set a target and click "Start" to begin training

Set a target and click "Start" to begin training

After the training is complete, DataRobot displays a leaderboard ranging from most to least accurate machine learning models.

Recommended models

Recommended models

According to DataRobot's algorithm, the most suitable model for our dataset is the eXtreme Gradient Boosted Trees Regressor (XGBoost).

XGBoost is typically used for supervised learning problems where the training data (with multiple features) is used to predict a target variable, which for our case is the resale&##95;price.

Feature impact

Feature impact

The algorithm accurately identified that the flat&##95;model input has minimal impact on the outcome of the model, presumably due to the correlation between the size of the flat (floor&##95;area&##95;sqm) and its type.

Impressive!

Predicting million-dollar HDB flats

So how did it fare?

Splendid! Some predictions were way off but most of it came pretty close.

actual_resale_price predicted_price town block street_name storey_range floor_area_sqm flat_type flat_model lease_commence_date remaining_lease datetime
1210000.0 1019587.3125* BISHAN 273A BISHAN ST 24 25 TO 27 120.0 5 ROOM DBSS 2011 89 2021 - 01 - 01
1030000.0 971905.625 BISHAN 273B BISHAN ST 24 10 TO 12 120.0 5 ROOM DBSS 2011 89 2021 - 03 - 01
1028000.01 789750.625* BISHAN 134 BISHAN ST 12 01 TO 03 144.0 EXECUTIVE Apartment 1986 64 2021 - 01 - 01
1008000.0 992980.125 BUKIT MERAH 10A BOON TIONG RD 31 TO 33 93.0 4 ROOM Model A 2016 94 2021 - 06 - 01
1018000.0 996907.4375 CENTRAL AREA 1C CANTONMENT RD 28 TO 30 97.0 4 ROOM Type S1 2011 89 2021 - 02 - 01
1025000.0 1026222.75 CENTRAL AREA 1F CANTONMENT RD 37 TO 39 95.0 4 ROOM Type S1 2011 89 2021 - 05 - 01
1030000.0 783793.125* CLEMENTI 312B CLEMENTI AVE 4 25 TO 27 113.0 5 ROOM Improved 2017 95 2021 - 08 - 01
1050000.0 1035908.1875 QUEENSTOWN 150 MEI LING ST 13 TO 15 146.0 EXECUTIVE Maisonette 1995 73 2021 - 02 - 01
...

Download the full dataset here

Forecasting 2025 prices

For kicks, I extracted a subset of the data to forecast potential HDB resale price in 2025.

Well, if the predictions are accurate, ceteris paribus, your HDB flat is a depreciating asset.

I'll leave it to your own judgement.

resale_price_in_2021 predicted_price town block street_name storey_range floor_area_sqm flat_type flat_model lease_commence_date remaining_lease datetime
233000.0 201131.21875 ANG MO KIO 406 ANG MO KIO AVE 10 04 TO 06 44.0 2 ROOM Improved 1979 53 2025- 09 - 01
330000.0 271170.1875 BEDOK 126 BEDOK NTH ST 2 07 TO 09 67.0 3 ROOM New Generation 1978 52 2025 - 09 - 01
520000.0 477625.8125 BISHAN 109 BISHAN ST 12 07 TO 09 84.0 4 ROOM Simplified 1985 59 2025 - 09 - 01
...

Download the full dataset here

Conclusion

I was pleasantly surprised by the findings of this little experiment.

It started doubtfully but I walked away thoroughly impressed.

I've barely begun to scratch the surface of what DataRobot is capable of.

The question is, would it cause Data Scientists to go out of a job?

Nah.

The DataRobot platform is primarily a tool for organisations to automate and scale AI applications.

There are business and development challenges that require human intervention.

Though if it were a hammer, it would be a bloody good hammer.

Further Observations

A compelling insight was uncovered during the data exploration phase.

Number of HDB Resale Flats sold for over $1 million in 2020

  • Total Transactions above 1 million: 77
  • Total Transactions less than 1 million: 23252
  • Minimum Floor Area: 92.0
  • Maximum Floor Area: 178.0
  • Minimum Resale Price: $1,000,188
  • Maximum Resale Price: $1,258,000
  • Flat Types: 4 ROOM, 5 ROOM, EXECUTIVE
  • Locations:
    • ANG MO KIO (2)
    • BISHAN (14)
    • BUKIT MERAH (12)
    • CENTRAL AREA (22)
    • CLEMENTI (3)
    • GEYLANG (1)
    • KALLANG/WHAMPOA (2)
    • QUEENSTOWN (14)
    • TOA PAYOH (7)
Flat Type Total Transactions Total Transactions > 1million % of transactions
4 ROOM 9647 13 0.135%
5 ROOM 5984 54 0.9%
EXECUTIVE 1882 10 0.53%

Number of HDB Resale Flats sold for over $1 million in 2021

  • Total Transactions above 1 million: 143
  • Total Transactions less than 1 million: 19694
  • Minimum Floor Area: 93.0
  • Maximum Floor Area: 243.0
  • Minimum Resale Price: $1,001,000
  • Maximum Resale Price: $1,295,000
  • Flat Types: 3 ROOM*, 4 ROOM, 5 ROOM, EXECUTIVE
  • Locations:
    • ANG MO KIO (1)
    • BISHAN (23)
    • BUKIT MERAH (12)
    • BUKIT TIMAH (10)
    • CENTRAL AREA (22)
    • CLEMENTI (8)
    • KALLANG/WHAMPOA (9)
    • QUEENSTOWN (23)
    • SERANGOON (3)
    • TOA PAYOH (13)
  • Rare HDB 3 room terrace house
Flat Type Total Transactions Total Transactions > 1million % of transactions
3 ROOM 4229 4 0.09%
4 ROOM 8449 21 0.25%
5 ROOM 5350 92 1.72%
EXECUTIVE 1543 26 1.67%

Feature impact

Total Transactions above SGD 1 Million 2020

Feature impact

Total Transactions above SGD 1 Million 2021

Out of curiosity, I searched for one of the units online and found an old video.


Ladies and Gentlemen, this is how a one-million-dollar HDB resale flat looks like.

A few key takeaways based on the information presented:

  • There was only a slight increment in the price of million-dollar resale flats between 2020 - 2021.
  • The probability of most resale flats exceeding a million dollars soon is slim.
  • The probability of you profiting a million dollars off your resale flat is even slimmer.
  • Prices are primarily driven by market forces. Buyers want to buy low. Sellers want to sell high.

Of course, a headline that reads "Only 143 well-renovated HDB resale flats in prime locations out of 19694 transactions in 2021 was sold for over S$1m" is less sensational than "Five-room HDB DBSS flat in Bishan sold for record S$1.295m!" or "106 HDB resale flats sold for over S$1m in first half of 2021, over 4-fold increase year-on-year!".

Latest Posts

How Chat-GPT Replaced My JobHow Chat-GPT Replaced My Job
The Rise and Fall of AI EmpiresThe Rise and Fall of AI Empires
GPT-3: The Latest Craze in NLPGPT-3: The Latest Craze in NLP

Copyright © Terence Lucas Yap

Powered by Gatsby JS