11. 주택 임대료 예측

Notice

Recent Posts

Recent Comments

Link

깃허브

« 2026/06 »
일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Tags more

Archives

Today

Total

관리 메뉴

수달이네 기술 블로그

11. 주택 임대료 예측 본문

AI공부/머신러닝

11. 주택 임대료 예측

슬픈 수달이 2025. 12. 29. 23:16

캐글링크: https://www.kaggle.com/datasets/iamsouravbanerjee/house-rent-prediction-dataset

데이터셋 구조

BHK: 침실(Bedroom), 거실(Hall), 주방(Kitchen)의 개수

Rent: 주택/아파트/플랫의 임대료

Size: 주택/아파트/플랫의 면적(제곱피트 단위)

Floor: 주택/아파트/플랫이 위치한 층수와 전체 층수 (예: 2층 중 1층, 5층 중 3층 등)

Area Type: 주택/아파트/플랫의 면적 산정 기준 (슈퍼 면적, 카펫 면적, 건축 면적 중 하나)

Area Locality: 주택/아파트/플랫이 위치한 지역(동네/구역)

City: 주택/아파트/플랫이 위치한 도시

Furnishing Status: 주택/아파트/플랫의 가구 상태 (가구 완비, 반가구, 무가구)

Tenant Preferred: 집주인 또는 중개인이 선호하는 임차인 유형

Bathroom: 욕실 개수

Point of Contact: 주택/아파트/플랫에 대한 추가 정보를 얻기 위해 연락해야 할 대상

import pandas as pd

df = pd.read_csv('./data/House_Rent_Dataset.csv')
df

012

Posted On	BHK	Rent	Size	Floor	Area Type	Area Locality	City	Furnishing Status	Tenant Preferred	Bathroom	Point of Contact
2022-05-18	2	10000	1100	Ground out of 2	Super Area	Bandel	Kolkata	Unfurnished	Bachelors/Family	2	Contact Owner
2022-05-13	2	20000	800	1 out of 3	Super Area	Phool Bagan, Kankurgachi	Kolkata	Semi-Furnished	Bachelors/Family	1	Contact Owner
2022-05-16	2	17000	1000	1 out of 3	Super Area	Salt Lake City Sector 2	Kolkata	Semi-Furnished	Bachelors/Family	1	Contact Owner

목표

다른 정보를 넣을 경우 렌트비를 예측한다.

데이터 분석

round(df.describe(), 2)

countmeanstdmin25%50%75%max

BHK	Rent	Size	Bathroom
4746.00	4746.00	4746.00	4746.00
2.08	34993.45	967.49	1.97
0.83	78106.41	634.20	0.88
1.00	1200.00	10.00	1.00
2.00	10000.00	550.00	1.00
2.00	16000.00	850.00	2.00
3.00	33000.00	1200.00	2.00
6.00	3500000.00	8000.00	10.00

위와 같이 값을 확인해볼 수 있다. round함수를 덧붙이는 것으로 소수점을 잘라줄 수 있기에 round를 사용했다.

결측치

df.isna().sum()

Posted On 0 BHK 0 Rent 0 Size 0 Floor 0 Area Type 0 Area Locality 0 City 0 Furnishing Status 0 Tenant Preferred 0 Bathroom 0 Point of Contact 0

BHK

import seaborn as sns
sns.displot(df['BHK'])

Rent

왼쪽에 치우쳐져 있고, 0.5가 넘는 값은 매우 적어보임

Boxplot

데이터의 중앙값, 사분위수, 이상치 등을 시각적으로 표현하는 통계그래프. 데이터 분포와 이상치를 빠르게 파악할때 사용

중앙값(Median,Q2): 데이터를 크기 순으로 정렬했을 때 중간에 위치한 값
Q1(제 1사분위 수): 하위 25%에 해당하는 값
Q3(제 3사분위 수): 상위 25%에 해당하는 값
IQR(Interquartile Range, 사분위 범위), Q3-Q1, 데이터의 중간 50% 범위

size값의 boxplot

이상치 구분. 위에 boxplot의 점을 확인하고 Boxplot의 의미에 따라 이상치를 구분해낼 수 있다.
특이한 경우는 학습내용을 제거
그러나 멀리 떨어져 있다고 무조건 이상치라고 단정할 수 없다.

df.info()

# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 4746 entries, 0 to 4745
# Data columns (total 12 columns):
#  #   Column             Non-Null Count  Dtype 
# ---  ------             --------------  ----- 
#  0   Posted On          4746 non-null   object
#  1   BHK                4746 non-null   int64 
#  2   Rent               4746 non-null   int64 
#  3   Size               4746 non-null   int64 
#  4   Floor              4746 non-null   object
#  5   Area Type          4746 non-null   object
#  6   Area Locality      4746 non-null   object
#  7   City               4746 non-null   object
#  8   Furnishing Status  4746 non-null   object
#  9   Tenant Preferred   4746 non-null   object
#  10  Bathroom           4746 non-null   int64 
#  11  Point of Contact   4746 non-null   object
# dtypes: int64(4), object(8)

값을 확인하고, 해당 값에 해당하는 인코딩을 해주어야 한다.

필요 없는 값 제거

Posted on: 해당 컬럼은 등록된 날짜 그리고 굳이 의미가 있을 것으로 보이지 않기 때문에 제거후보

Floor: Name: valuecounts()함수를 사용했을때, count, Length: 480, dtype: int64 의 값이 나왔다.

특히 값을 딱 떨어뜨릴 수 없는 특수한 값들이 너무 많아서 차라리 제거하는 편이 좋아보였다
또, 하나만 있는 값들이 너무 많았다.

Locality: 또한 너무 많다. 값이 나중에 군집화를 배우고 쓰기로 정함.

원핫 인코딩을 해야할지 골라야한다

df.drop(['Floor','Posted On', 'Area Locality'], axis=1, inplace=True)

BHK	Rent	Size	Area Type	City	Furnishing Status	Tenant Preferred	Bathroom	Point of Contact
2	10000	1100	Super Area	Kolkata	Unfurnished	Bachelors/Family	2	Contact Owner
2	20000	800	Super Area	Kolkata	Semi-Furnished	Bachelors/Family	1	Contact Owner
2	17000	1000	Super Area	Kolkata	Semi-Furnished	Bachelors/Family	1	Contact Owner
2	10000	800	Super Area	Kolkata	Unfurnished	Bachelors/Family	1	Contact Owner
2	7500	850	Carpet Area	Kolkata	Unfurnished	Bachelors	1	Contact Owner
...	...	...	...	...	...	...	...	...
2	15000	1000	Carpet Area	Hyderabad	Semi-Furnished	Bachelors/Family	2	Contact Owner
3	29000	2000	Super Area	Hyderabad	Semi-Furnished	Bachelors/Family	3	Contact Owner
3	35000	1750	Carpet Area	Hyderabad	Semi-Furnished	Bachelors/Family	3	Contact Agent
3	45000	1500	Carpet Area	Hyderabad	Semi-Furnished	Family	2	Contact Agent
2	15000	1000	Carpet Area	Hyderabad	Unfurnished	Bachelors	2	Contact Owner

'AI공부 > 머신러닝' 카테고리의 다른 글

13. 자전거 대여 수요 예측 (0)	2026.01.02
12. 주택 임대료 예측 2 (0)	2025.12.30
10. Iris 데이터셋 예측(머신러닝 입문) (0)	2025.12.28
8. 머신러닝 기초 (1)	2025.12.10
7.셀레니움을 이용한 크롤링3: 데이터프레임화(야놀자 리뷰 크롤링) (1)	2025.12.01

'AI공부/머신러닝' Related Articles

수달이네 기술 블로그

11. 주택 임대료 예측 본문

11. 주택 임대료 예측

데이터셋 구조

목표

데이터 분석

결측치

BHK

Rent

Boxplot

필요 없는 값 제거

'AI공부 > 머신러닝' 카테고리의 다른 글

티스토리툴바