"Five-room HDB unit at Ang Mo Kio sold for $1.01 mil!"
"Five-room HDB DBSS flat in Bishan sold for record S$1.295m!"
These are the breaking news commonly reported in the Singapore media today.
Are million-dollar HDB resale flats the norm in future or is this simply a case of journalism sensationalism?
What if we could use machine learning algorithms to predict the future price of HDB resale flats?
I had the opportunity to fiddle with the DataRobot Enterprise AI Platform recently.
The first thought that came into my head was to see if it could predict Singapore Pools 4d Lottery winning numbers.
Sensibility prevailed.
I opted for the next best thing instead. HDB resale flat prices.
DataRobot touts itself as an "enterprise AI platform that accelerates and democratizes data science by automating the end-to-end journey from data to value. This allows you to deploy trusted AI applications at scale within your organization. DataRobot provides a centrally governed platform that gives you the power of AI to drive better business outcomes and is available on your cloud platform-of-choice, on-premise, or as a fully-managed service".
Well...
Obtaining the dataset required minimal effort. Data.gov.sg has a huge dataset of HDB resale transacted prices from January 1, 1990, to September 9, 2021, at the time of writing.
Smart Nation for the win!
RangeIndex: 853645 entries, 0 to 853644
Data columns (total 11 columns):
## Column Non-Null Count Dtype
--- ------ -------------- -----
0 month 853645 non-null object
1 town 853645 non-null object
2 flat_type 853645 non-null object
3 block 853645 non-null object
4 street_name 853645 non-null object
5 storey_range 853645 non-null object
6 floor_area_sqm 853645 non-null float64
7 flat_model 853645 non-null object
8 lease_commence_date 853645 non-null int64
9 resale_price 853645 non-null float64
10 remaining_lease 144595 non-null object
month | town | flat_type | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | resale_price | remaining_lease | |
---|---|---|---|---|---|---|---|---|---|---|---|
2012 - 03 | ANG MO KIO | 2 ROOM | 172 | ANG MO KIO AVE 4 | 06 TO 10 | 45.0 | Improved | 1986 | 250000.0 | NaN | |
2012 - 03 | ANG MO KIO | 3 ROOM | 610 | ANG MO KIO AVE 4 | 06 TO 10 | 68.0 | New Generation | 1980 | 150000.0 | NaN | |
... |
There's not much left to do other than:
Given that all HDB flats in Singapore have a maximum lease of 99 years, we can derive the remaining lease by deducting 99 from the current year minus the lease commence date.
>>> df['datetime'] = pd.to_datetime(df['month'], format="%Y-%m") ## Convert to DateTime
>>> df['remaining_lease'] = 99 - (2021 - df['lease_commence_date']) ## Calculate remaining lease
>>> df[['year','month']] = df.month.str.split("-", expand=True) ## Split year and month
>>> df = df[df.year != "2021"] ## Remove year 2021 records from DataFrame
>>> df.drop(['month', 'year'], inplace=True, axis=1) ## Drop the redundant month and year columns
datetime | town | flat_type | block | street_name | storey_range | floor_area_sqm | flat_model | lease_commence_date | resale_price | remaining_lease | |
---|---|---|---|---|---|---|---|---|---|---|---|
2012 - 03 - 01 | ANG MO KIO | 2 ROOM | 172 | ANG MO KIO AVE 4 | 06 TO 10 | 45.0 | Improved | 1986 | 250000.0 | 64 | |
2012 - 03 - 01 | ANG MO KIO | 3 ROOM | 610 | ANG MO KIO AVE 4 | 06 TO 10 | 68.0 | New Generation | 1980 | 150000.0 | 58 | |
... |
Download the full dataset here
It's relatively straightforward from here on.
We feed the data to DataRobot and let its algorithm handle the rest.
After the training is complete, DataRobot displays a leaderboard ranging from most to least accurate machine learning models.
According to DataRobot's algorithm, the most suitable model for our dataset is the eXtreme Gradient Boosted Trees Regressor (XGBoost).
XGBoost is typically used for supervised learning problems where the training data (with multiple features) is used to predict a target variable, which for our case is the resale&##95;price.
The algorithm accurately identified that the flat&##95;model input has minimal impact on the outcome of the model, presumably due to the correlation between the size of the flat (floor&##95;area&##95;sqm) and its type.
Impressive!
So how did it fare?
Splendid! Some predictions were way off but most of it came pretty close.
actual_resale_price | predicted_price | town | block | street_name | storey_range | floor_area_sqm | flat_type | flat_model | lease_commence_date | remaining_lease | datetime |
---|---|---|---|---|---|---|---|---|---|---|---|
1210000.0 | 1019587.3125* | BISHAN | 273A | BISHAN ST 24 | 25 TO 27 | 120.0 | 5 ROOM | DBSS | 2011 | 89 | 2021 - 01 - 01 |
1030000.0 | 971905.625 | BISHAN | 273B | BISHAN ST 24 | 10 TO 12 | 120.0 | 5 ROOM | DBSS | 2011 | 89 | 2021 - 03 - 01 |
1028000.01 | 789750.625* | BISHAN | 134 | BISHAN ST 12 | 01 TO 03 | 144.0 | EXECUTIVE | Apartment | 1986 | 64 | 2021 - 01 - 01 |
1008000.0 | 992980.125 | BUKIT MERAH | 10A | BOON TIONG RD | 31 TO 33 | 93.0 | 4 ROOM | Model A | 2016 | 94 | 2021 - 06 - 01 |
1018000.0 | 996907.4375 | CENTRAL AREA | 1C | CANTONMENT RD | 28 TO 30 | 97.0 | 4 ROOM | Type S1 | 2011 | 89 | 2021 - 02 - 01 |
1025000.0 | 1026222.75 | CENTRAL AREA | 1F | CANTONMENT RD | 37 TO 39 | 95.0 | 4 ROOM | Type S1 | 2011 | 89 | 2021 - 05 - 01 |
1030000.0 | 783793.125* | CLEMENTI | 312B | CLEMENTI AVE 4 | 25 TO 27 | 113.0 | 5 ROOM | Improved | 2017 | 95 | 2021 - 08 - 01 |
1050000.0 | 1035908.1875 | QUEENSTOWN | 150 | MEI LING ST | 13 TO 15 | 146.0 | EXECUTIVE | Maisonette | 1995 | 73 | 2021 - 02 - 01 |
... |
Download the full dataset here
For kicks, I extracted a subset of the data to forecast potential HDB resale price in 2025.
Well, if the predictions are accurate, ceteris paribus, your HDB flat is a depreciating asset.
I'll leave it to your own judgement.
resale_price_in_2021 | predicted_price | town | block | street_name | storey_range | floor_area_sqm | flat_type | flat_model | lease_commence_date | remaining_lease | datetime |
---|---|---|---|---|---|---|---|---|---|---|---|
233000.0 | 201131.21875 | ANG MO KIO | 406 | ANG MO KIO AVE 10 | 04 TO 06 | 44.0 | 2 ROOM | Improved | 1979 | 53 | 2025- 09 - 01 |
330000.0 | 271170.1875 | BEDOK | 126 | BEDOK NTH ST 2 | 07 TO 09 | 67.0 | 3 ROOM | New Generation | 1978 | 52 | 2025 - 09 - 01 |
520000.0 | 477625.8125 | BISHAN | 109 | BISHAN ST 12 | 07 TO 09 | 84.0 | 4 ROOM | Simplified | 1985 | 59 | 2025 - 09 - 01 |
... |
Download the full dataset here
I was pleasantly surprised by the findings of this little experiment.
It started doubtfully but I walked away thoroughly impressed.
I've barely begun to scratch the surface of what DataRobot is capable of.
The question is, would it cause Data Scientists to go out of a job?
Nah.
The DataRobot platform is primarily a tool for organisations to automate and scale AI applications.
There are business and development challenges that require human intervention.
Though if it were a hammer, it would be a bloody good hammer.
A compelling insight was uncovered during the data exploration phase.
Flat Type | Total Transactions | Total Transactions > 1million | % of transactions |
---|---|---|---|
4 ROOM | 9647 | 13 | 0.135% |
5 ROOM | 5984 | 54 | 0.9% |
EXECUTIVE | 1882 | 10 | 0.53% |
Flat Type | Total Transactions | Total Transactions > 1million | % of transactions |
---|---|---|---|
3 ROOM | 4229 | 4 | 0.09% |
4 ROOM | 8449 | 21 | 0.25% |
5 ROOM | 5350 | 92 | 1.72% |
EXECUTIVE | 1543 | 26 | 1.67% |
Out of curiosity, I searched for one of the units online and found an old video.
A few key takeaways based on the information presented:
Of course, a headline that reads "Only 143 well-renovated HDB resale flats in prime locations out of 19694 transactions in 2021 was sold for over S$1m" is less sensational than "Five-room HDB DBSS flat in Bishan sold for record S$1.295m!" or "106 HDB resale flats sold for over S$1m in first half of 2021, over 4-fold increase year-on-year!".