The Algorithms logo
算法
关于我们捐赠

房价预测

H
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Finding Best ML Algorithm for House Price Prediction using k Cross Validation and GridSearchCV. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In machine learning, we couldn’t fit the model on the training data and can’t say that the model will work accurately for the real data. For this, we must assure that our model got the correct patterns from the data, and it is not getting up too much noise. For this purpose, we use the cross-validation technique.Cross-validation is a technique in which we train our model using the subset of the data-set and then evaluate using the complementary subset of the data-set."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Dataset is downloaded from here: https://www.kaggle.com/amitabhajoy/bengaluru-house-price-data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "from matplotlib import pyplot as plt\n",
    "%matplotlib inline \n",
    "import matplotlib\n",
    "matplotlib.rcParams[\"figure.figsize\"]=(20,10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [],
   "source": [
    "df1 = pd.read_csv(\"Bengaluru_House_Data.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>area_type</th>\n",
       "      <th>availability</th>\n",
       "      <th>location</th>\n",
       "      <th>size</th>\n",
       "      <th>society</th>\n",
       "      <th>total_sqft</th>\n",
       "      <th>bath</th>\n",
       "      <th>balcony</th>\n",
       "      <th>price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Super built-up  Area</td>\n",
       "      <td>19-Dec</td>\n",
       "      <td>Electronic City Phase II</td>\n",
       "      <td>2 BHK</td>\n",
       "      <td>Coomee</td>\n",
       "      <td>1056</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>39.07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Plot  Area</td>\n",
       "      <td>Ready To Move</td>\n",
       "      <td>Chikka Tirupathi</td>\n",
       "      <td>4 Bedroom</td>\n",
       "      <td>Theanmp</td>\n",
       "      <td>2600</td>\n",
       "      <td>5.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>120.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Built-up  Area</td>\n",
       "      <td>Ready To Move</td>\n",
       "      <td>Uttarahalli</td>\n",
       "      <td>3 BHK</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1440</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>62.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Super built-up  Area</td>\n",
       "      <td>Ready To Move</td>\n",
       "      <td>Lingadheeranahalli</td>\n",
       "      <td>3 BHK</td>\n",
       "      <td>Soiewre</td>\n",
       "      <td>1521</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>95.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Super built-up  Area</td>\n",
       "      <td>Ready To Move</td>\n",
       "      <td>Kothanur</td>\n",
       "      <td>2 BHK</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1200</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>51.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              area_type   availability                  location       size  \\\n",
       "0  Super built-up  Area         19-Dec  Electronic City Phase II      2 BHK   \n",
       "1            Plot  Area  Ready To Move          Chikka Tirupathi  4 Bedroom   \n",
       "2        Built-up  Area  Ready To Move               Uttarahalli      3 BHK   \n",
       "3  Super built-up  Area  Ready To Move        Lingadheeranahalli      3 BHK   \n",
       "4  Super built-up  Area  Ready To Move                  Kothanur      2 BHK   \n",
       "\n",
       "   society total_sqft  bath  balcony   price  \n",
       "0  Coomee        1056   2.0      1.0   39.07  \n",
       "1  Theanmp       2600   5.0      3.0  120.00  \n",
       "2      NaN       1440   2.0      3.0   62.00  \n",
       "3  Soiewre       1521   3.0      1.0   95.00  \n",
       "4      NaN       1200   2.0      1.0   51.00  "
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "area_type\n",
       "Built-up  Area          2418\n",
       "Carpet  Area              87\n",
       "Plot  Area              2025\n",
       "Super built-up  Area    8790\n",
       "Name: area_type, dtype: int64"
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1.groupby('area_type')['area_type'].agg('count')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [],
   "source": [
    "df2 = df1.drop(['area_type','society','balcony','availability'] , axis=\"columns\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>location</th>\n",
       "      <th>size</th>\n",
       "      <th>total_sqft</th>\n",
       "      <th>bath</th>\n",
       "      <th>price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Electronic City Phase II</td>\n",
       "      <td>2 BHK</td>\n",
       "      <td>1056</td>\n",
       "      <td>2.0</td>\n",
       "      <td>39.07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Chikka Tirupathi</td>\n",
       "      <td>4 Bedroom</td>\n",
       "      <td>2600</td>\n",
       "      <td>5.0</td>\n",
       "      <td>120.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Uttarahalli</td>\n",
       "      <td>3 BHK</td>\n",
       "      <td>1440</td>\n",
       "      <td>2.0</td>\n",
       "      <td>62.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Lingadheeranahalli</td>\n",
       "      <td>3 BHK</td>\n",
       "      <td>1521</td>\n",
       "      <td>3.0</td>\n",
       "      <td>95.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Kothanur</td>\n",
       "      <td>2 BHK</td>\n",
       "      <td>1200</td>\n",
       "      <td>2.0</td>\n",
       "      <td>51.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   location       size total_sqft  bath   price\n",
       "0  Electronic City Phase II      2 BHK       1056   2.0   39.07\n",
       "1          Chikka Tirupathi  4 Bedroom       2600   5.0  120.00\n",
       "2               Uttarahalli      3 BHK       1440   2.0   62.00\n",
       "3        Lingadheeranahalli      3 BHK       1521   3.0   95.00\n",
       "4                  Kothanur      2 BHK       1200   2.0   51.00"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Data Cleaning: Handling NA/Null values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "location       1\n",
       "size          16\n",
       "total_sqft     0\n",
       "bath          73\n",
       "price          0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.isnull().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "location      0\n",
       "size          0\n",
       "total_sqft    0\n",
       "bath          0\n",
       "price         0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3 = df2.dropna()\n",
    "df3.isnull().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['2 BHK', '4 Bedroom', '3 BHK', '4 BHK', '6 Bedroom', '3 Bedroom',\n",
       "       '1 BHK', '1 RK', '1 Bedroom', '8 Bedroom', '2 Bedroom',\n",
       "       '7 Bedroom', '5 BHK', '7 BHK', '6 BHK', '5 Bedroom', '11 BHK',\n",
       "       '9 BHK', nan, '9 Bedroom', '27 BHK', '10 Bedroom', '11 Bedroom',\n",
       "       '10 BHK', '19 BHK', '16 BHK', '43 Bedroom', '14 BHK', '8 BHK',\n",
       "       '12 Bedroom', '13 BHK', '18 Bedroom'], dtype=object)"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2['size'].unique()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Feature Engineering"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-81-4c4c73fbe7f4>:1: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: https://pandas.ac.cn/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  df3['bhk'] = df3['size'].apply(lambda x: int(x.split(' ')[0]))\n"
     ]
    }
   ],
   "source": [
    "df3['bhk'] = df3['size'].apply(lambda x: int(x.split(' ')[0]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 2,  4,  3,  6,  1,  8,  7,  5, 11,  9, 27, 10, 19, 16, 43, 14, 12,\n",
       "       13, 18], dtype=int64)"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3['bhk'].unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>location</th>\n",
       "      <th>size</th>\n",
       "      <th>total_sqft</th>\n",
       "      <th>bath</th>\n",
       "      <th>price</th>\n",
       "      <th>bhk</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1718</th>\n",
       "      <td>2Electronic City Phase II</td>\n",
       "      <td>27 BHK</td>\n",
       "      <td>8000</td>\n",
       "      <td>27.0</td>\n",
       "      <td>230.0</td>\n",
       "      <td>27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4684</th>\n",
       "      <td>Munnekollal</td>\n",
       "      <td>43 Bedroom</td>\n",
       "      <td>2400</td>\n",
       "      <td>40.0</td>\n",
       "      <td>660.0</td>\n",
       "      <td>43</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                       location        size total_sqft  bath  price  bhk\n",
       "1718  2Electronic City Phase II      27 BHK       8000  27.0  230.0   27\n",
       "4684                Munnekollal  43 Bedroom       2400  40.0  660.0   43"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3[df3.bhk>20]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['1056', '2600', '1440', ..., '1133 - 1384', '774', '4689'],\n",
       "      dtype=object)"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3.total_sqft.unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [],
   "source": [
    "def is_float(x):\n",
    "    try:\n",
    "        float(x)\n",
    "    except:\n",
    "        return False\n",
    "    return True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>location</th>\n",
       "      <th>size</th>\n",
       "      <th>total_sqft</th>\n",
       "      <th>bath</th>\n",
       "      <th>price</th>\n",
       "      <th>bhk</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>Yelahanka</td>\n",
       "      <td>4 BHK</td>\n",
       "      <td>2100 - 2850</td>\n",
       "      <td>4.0</td>\n",
       "      <td>186.000</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>122</th>\n",
       "      <td>Hebbal</td>\n",
       "      <td>4 BHK</td>\n",
       "      <td>3067 - 8156</td>\n",
       "      <td>4.0</td>\n",
       "      <td>477.000</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>137</th>\n",
       "      <td>8th Phase JP Nagar</td>\n",
       "      <td>2 BHK</td>\n",
       "      <td>1042 - 1105</td>\n",
       "      <td>2.0</td>\n",
       "      <td>54.005</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>165</th>\n",
       "      <td>Sarjapur</td>\n",
       "      <td>2 BHK</td>\n",
       "      <td>1145 - 1340</td>\n",
       "      <td>2.0</td>\n",
       "      <td>43.490</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>188</th>\n",
       "      <td>KR Puram</td>\n",
       "      <td>2 BHK</td>\n",
       "      <td>1015 - 1540</td>\n",
       "      <td>2.0</td>\n",
       "      <td>56.800</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>410</th>\n",
       "      <td>Kengeri</td>\n",
       "      <td>1 BHK</td>\n",
       "      <td>34.46Sq. Meter</td>\n",
       "      <td>1.0</td>\n",
       "      <td>18.500</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>549</th>\n",
       "      <td>Hennur Road</td>\n",
       "      <td>2 BHK</td>\n",
       "      <td>1195 - 1440</td>\n",
       "      <td>2.0</td>\n",
       "      <td>63.770</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>648</th>\n",
       "      <td>Arekere</td>\n",
       "      <td>9 Bedroom</td>\n",
       "      <td>4125Perch</td>\n",
       "      <td>9.0</td>\n",
       "      <td>265.000</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>661</th>\n",
       "      <td>Yelahanka</td>\n",
       "      <td>2 BHK</td>\n",
       "      <td>1120 - 1145</td>\n",
       "      <td>2.0</td>\n",
       "      <td>48.130</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>672</th>\n",
       "      <td>Bettahalsoor</td>\n",
       "      <td>4 Bedroom</td>\n",
       "      <td>3090 - 5002</td>\n",
       "      <td>4.0</td>\n",
       "      <td>445.000</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               location       size      total_sqft  bath    price  bhk\n",
       "30            Yelahanka      4 BHK     2100 - 2850   4.0  186.000    4\n",
       "122              Hebbal      4 BHK     3067 - 8156   4.0  477.000    4\n",
       "137  8th Phase JP Nagar      2 BHK     1042 - 1105   2.0   54.005    2\n",
       "165            Sarjapur      2 BHK     1145 - 1340   2.0   43.490    2\n",
       "188            KR Puram      2 BHK     1015 - 1540   2.0   56.800    2\n",
       "410             Kengeri      1 BHK  34.46Sq. Meter   1.0   18.500    1\n",
       "549         Hennur Road      2 BHK     1195 - 1440   2.0   63.770    2\n",
       "648             Arekere  9 Bedroom       4125Perch   9.0  265.000    9\n",
       "661           Yelahanka      2 BHK     1120 - 1145   2.0   48.130    2\n",
       "672        Bettahalsoor  4 Bedroom     3090 - 5002   4.0  445.000    4"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3[~df3['total_sqft'].apply(is_float)].head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Above shows that total_sqft can be a range (e.g. 2100-2850). For such case we can just take average of min and max value in the range. There are other cases such as 34.46Sq. Meter which one can convert to square ft using unit conversion. I am going to just drop such corner cases to keep things simple "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [],
   "source": [
    "def convert_sqft_to_num(x):\n",
    "    tokens = x.split('-')\n",
    "    if len(tokens) == 2:\n",
    "        return (float(tokens[0]) + float(tokens[1]))/2\n",
    "    try:\n",
    "        return float(x)\n",
    "    except:\n",
    "        return None\n",
    "        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [],
   "source": [
    "df4 = df3.copy()\n",
    "df4['total_sqft'] = df4['total_sqft'].apply(convert_sqft_to_num)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [],
   "source": [
    "df5 = df4.copy()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Add new feature called price per square feet"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [],
   "source": [
    "df5['price_per_sqft'] = df5['price']*100000/df5['total_sqft']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1304"
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df5.location.unique())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  Examine locations which is a categorical variable. We need to apply dimensionality reduction technique here to reduce number of locations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [],
   "source": [
    "df5.location = df5.location.apply(lambda x: x.strip())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [],
   "source": [
    "location_stats = df5.groupby('location')['location'].agg('count').sort_values(ascending=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "location\n",
       "Whitefield           535\n",
       "Sarjapur  Road       392\n",
       "Electronic City      304\n",
       "Kanakpura Road       266\n",
       "Thanisandra          236\n",
       "                    ... \n",
       "LIC Colony             1\n",
       "Kuvempu Layout         1\n",
       "Kumbhena Agrahara      1\n",
       "Kudlu Village,         1\n",
       "1 Annasandrapalya      1\n",
       "Name: location, Length: 1293, dtype: int64"
      ]
     },
     "execution_count": 94,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "location_stats"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1052"
      ]
     },
     "execution_count": 95,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(location_stats[location_stats<=10])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Dimensionality Reduction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Any location having less than 10 data points should be tagged as \"other\" location. This way number of categories can be reduced by huge amount. Later on when we do one hot encoding, it will help us with having fewer dummy columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {},
   "outputs": [],
   "source": [
    "location_stats_less_10 = location_stats[location_stats<=10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "location\n",
       "BTM 1st Stage          10\n",
       "Basapura               10\n",
       "Sector 1 HSR Layout    10\n",
       "Naganathapura          10\n",
       "Kalkere                10\n",
       "                       ..\n",
       "LIC Colony              1\n",
       "Kuvempu Layout          1\n",
       "Kumbhena Agrahara       1\n",
       "Kudlu Village,          1\n",
       "1 Annasandrapalya       1\n",
       "Name: location, Length: 1052, dtype: int64"
      ]
     },
     "execution_count": 97,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "location_stats_less_10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1293"
      ]
     },
     "execution_count": 98,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df5.location.unique())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {},
   "outputs": [],
   "source": [
    "df5.location = df5.location.apply(lambda x: 'other' if x in location_stats_less_10 else x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "242"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df5.location.unique())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>location</th>\n",
       "      <th>size</th>\n",
       "      <th>total_sqft</th>\n",
       "      <th>bath</th>\n",
       "      <th>price</th>\n",
       "      <th>bhk</th>\n",
       "      <th>price_per_sqft</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Electronic City Phase II</td>\n",
       "      <td>2 BHK</td>\n",
       "      <td>1056.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>39.07</td>\n",
       "      <td>2</td>\n",
       "      <td>3699.810606</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Chikka Tirupathi</td>\n",
       "      <td>4 Bedroom</td>\n",
       "      <td>2600.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>120.00</td>\n",
       "      <td>4</td>\n",
       "      <td>4615.384615</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Uttarahalli</td>\n",
       "      <td>3 BHK</td>\n",
       "      <td>1440.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>62.00</td>\n",
       "      <td>3</td>\n",
       "      <td>4305.555556</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Lingadheeranahalli</td>\n",
       "      <td>3 BHK</td>\n",
       "      <td>1521.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>95.00</td>\n",
       "      <td>3</td>\n",
       "      <td>6245.890861</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Kothanur</td>\n",
       "      <td>2 BHK</td>\n",
       "      <td>1200.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>51.00</td>\n",
       "      <td>2</td>\n",
       "      <td>4250.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   location       size  total_sqft  bath   price  bhk  \\\n",
       "0  Electronic City Phase II      2 BHK      1056.0   2.0   39.07    2   \n",
       "1          Chikka Tirupathi  4 Bedroom      2600.0   5.0  120.00    4   \n",
       "2               Uttarahalli      3 BHK      1440.0   2.0   62.00    3   \n",
       "3        Lingadheeranahalli      3 BHK      1521.0   3.0   95.00    3   \n",
       "4                  Kothanur      2 BHK      1200.0   2.0   51.00    2   \n",
       "\n",
       "   price_per_sqft  \n",
       "0     3699.810606  \n",
       "1     4615.384615  \n",
       "2     4305.555556  \n",
       "3     6245.890861  \n",
       "4     4250.000000  "
      ]
     },
     "execution_count": 101,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df5.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Outlier Removal Using Business Logic"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>location</th>\n",
       "      <th>size</th>\n",
       "      <th>total_sqft</th>\n",
       "      <th>bath</th>\n",
       "      <th>price</th>\n",
       "      <th>bhk</th>\n",
       "      <th>price_per_sqft</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>other</td>\n",
       "      <td>6 Bedroom</td>\n",
       "      <td>1020.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>370.0</td>\n",
       "      <td>6</td>\n",
       "      <td>36274.509804</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>HSR Layout</td>\n",
       "      <td>8 Bedroom</td>\n",
       "      <td>600.0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>200.0</td>\n",
       "      <td>8</td>\n",
       "      <td>33333.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>58</th>\n",
       "      <td>Murugeshpalya</td>\n",
       "      <td>6 Bedroom</td>\n",
       "      <td>1407.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>6</td>\n",
       "      <td>10660.980810</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>68</th>\n",
       "      <td>Devarachikkanahalli</td>\n",
       "      <td>8 Bedroom</td>\n",
       "      <td>1350.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>85.0</td>\n",
       "      <td>8</td>\n",
       "      <td>6296.296296</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>70</th>\n",
       "      <td>other</td>\n",
       "      <td>3 Bedroom</td>\n",
       "      <td>500.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>100.0</td>\n",
       "      <td>3</td>\n",
       "      <td>20000.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               location       size  total_sqft  bath  price  bhk  \\\n",
       "9                 other  6 Bedroom      1020.0   6.0  370.0    6   \n",
       "45           HSR Layout  8 Bedroom       600.0   9.0  200.0    8   \n",
       "58        Murugeshpalya  6 Bedroom      1407.0   4.0  150.0    6   \n",
       "68  Devarachikkanahalli  8 Bedroom      1350.0   7.0   85.0    8   \n",
       "70                other  3 Bedroom       500.0   3.0  100.0    3   \n",
       "\n",
       "    price_per_sqft  \n",
       "9     36274.509804  \n",
       "45    33333.333333  \n",
       "58    10660.980810  \n",
       "68     6296.296296  \n",
       "70    20000.000000  "
      ]
     },
     "execution_count": 102,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df5[df5.total_sqft/df5.bhk<300].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Check above data points. We have 6 bhk apartment with 1020 sqft. Another one is 8 bhk and total sqft is 600. These are clear data errors that can be removed safely"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(13246, 7)"
      ]
     },
     "execution_count": 103,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df5.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "metadata": {},
   "outputs": [],
   "source": [
    "df6 = df5[~(df5.total_sqft/df5.bhk<300)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(12502, 7)"
      ]
     },
     "execution_count": 105,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df6.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Outlier Removal Using Standard Deviation and Mean"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count     12456.000000\n",
       "mean       6308.502826\n",
       "std        4168.127339\n",
       "min         267.829813\n",
       "25%        4210.526316\n",
       "50%        5294.117647\n",
       "75%        6916.666667\n",
       "max      176470.588235\n",
       "Name: price_per_sqft, dtype: float64"
      ]
     },
     "execution_count": 106,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df6.price_per_sqft.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {},
   "outputs": [],
   "source": [
    "def remove_pps_outliers(df):\n",
    "    df_out = pd.DataFrame()\n",
    "    for key, subdf in df.groupby('location'):\n",
    "        m = np.mean(subdf.price_per_sqft)\n",
    "        st = np.std(subdf.price_per_sqft)\n",
    "        reduced_df = subdf[(subdf.price_per_sqft>(m-st)) & (subdf.price_per_sqft<=(m+st))]\n",
    "        df_out = pd.concat([df_out,reduced_df],ignore_index=True)\n",
    "    return df_out    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {},
   "outputs": [],
   "source": [
    "df7 = remove_pps_outliers(df6)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(10241, 7)"
      ]
     },
     "execution_count": 109,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df7.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA3sAAAJcCAYAAABAE73ZAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdf5zdd10n+tc7baUyGWyhRWIK22KCFLqa5Y7AXbM6qDwouQqL4J3u6l27zS4uP1ahygKuu6CuPnpBRC+PVW7RvUHAJRV1qWyK/DLhRhBuqpUf7bLJLo2EsBAr4GTWlrbzuX+cM82QTCYnyZw5c77zfD4eeZw5n+/3e857grZ9Pd7v7+dbrbUAAADQLRtGXQAAAAArT9gDAADoIGEPAACgg4Q9AACADhL2AAAAOkjYAwAA6CBhD4B1o6p+pKreN+C5P1NVv9n/+XFVdbyqLhhuhQCwcspz9gAYJ1V1d5JvTvJgkuNJ3pvkpa214yOu6RuTPL61Ntdf+2dJfrS1Nj2qugBY33T2ABhHP9ha25hkW5K/l+TVI64nSS5M8pOjLqKqLhx1DQCsDcIeAGOrtfY/kvxReqEvSVJVr6qq/1ZVs1V1Z1U9b9Gx66tq/6L3v1ZVn6uqv6mq26vqHyw69tqqenv/5yurqp0hSL0+yU9X1SVLHTzDd31jVb21qr5cVXdV1b+qqiNn8Tv9SVW9sar+OslrB/vbA6DrhD0AxlZVXZHk2UkOLVr+b0n+QZJvSvJzSd5eVZtO8xH/X3pB8ZFJfifJ71bVxedYzoEke5P89Dl812uSXJnk8UmemeRHT7r2TL/T05L89ySPTvKL51g/AB0j7AEwjv5TVc0m+VySL6UXlpIkrbXfba0dba3Nt9Z2JzmY5KlLfUhr7e2ttXtaaw+01t6Q5GFJvu086vq3Sf5lVV1+lt/1vyf5pdbal1trR5L8Xydde6bf6Whr7U39z/7b86gfgA4R9gAYR/+wtTaZZDrJE5NctnCgqv5JVd1RVV+pqq8kuWbx8cWq6qf6Y5Nf7Z/7Tac7dxCttU8leU+SV53ld31LesF1wedOuvZMv9PXnQ8AibAHwBhrre1LsivJLydJVf2dJG9J8tIkj2qtXZLkU0nq5Gv798y9Mr2u2qX9c7+61Lln6TVJ/nmSzWfxXV9IcsWiz3jsomsH+Z1srQ3AKYQ9AMbdryZ5ZlVtSzKRXvA5liRV9U/T64ItZTLJA/1zL6yqf5vkEedbTGvtUJLdSX7iLL7rliSvrqpLq2pzesFuwdn8TgDwEGEPgLHWWjuW5LeT/JvW2p1J3pDko0m+mOTvJvmT01z6R0luS/JfkxxOcm9Wbhzy59MLaYN+188nOZLks0k+kORdSe5LkrP8nQDgIR6qDsC6UVU3pPeg8+8d4NyfT3JFa+2G4Vd2yne/KMl1rbXvWe3vBqA7dPYAWE+enF73bFlVVUmeNMi5K6GqNlXVd1XVhqr6tiQ/leQPVuO7Aeiu5R4OCwCdUVX/KcnWJD88wOl/lt4Y5UvPdOIK+YYk/3eSq5J8Jck7k/z6Kn03AB1ljBMAAKCDjHECAAB00FiPcV522WXtyiuvHHUZAAAAI3H77bf/VWvt8qWOjXXYu/LKK3PgwIFRlwEAADASVXX4dMeMcQIAAHSQsAcAANBBwh4AAEAHjfU9e0u5//77c+TIkdx7772jLmWkLr744lxxxRW56KKLRl0KAAAwAp0Le0eOHMnk5GSuvPLKVNWoyxmJ1lruueeeHDlyJFddddWoywEAAEagc2Oc9957bx71qEet26CXJFWVRz3qUeu+uwkAAOtZ58JeknUd9Bb4OwAAgPWtk2EPAABgvRP2VtjnPve5POMZz8jVV1+dJz/5yfm1X/u1Jc977Wtfm82bN2fbtm154hOfmBe96EWZn59Pklx//fV517ve9XXnb9y4MUly991355prrnlo/S1veUue8pSn5Mtf/vKQfiMAAGAcrfuwNzub/OZvJq98Ze91dvb8Pu/CCy/MG97whtx111350z/90/z7f//vc+eddy557stf/vLccccdufPOO/PJT34y+/btO6vvetvb3pY3velNed/73pdLL730/AoHAAA6pXO7cZ6N/fuTHTuS+flkbi6ZmEhuvDHZsyfZvv3cPnPTpk3ZtGlTkmRycjJXX311Pv/5z+dJT3rSaa/52te+lnvvvfesAtstt9ySm266KR/84Adz2WWXnVuxAABAZ63bzt7sbC/ozc72gl7Se11YP378/L/j7rvvzp//+Z/naU972pLH3/jGN2bbtm3ZtGlTnvCEJ2Tbtm0PHXvFK16Rbdu2PfRnscOHD+elL31p3ve+9+Uxj3nM+RcKAAB0zroNe7t39zp6S5mf7x0/H8ePH8/zn//8/Oqv/moe8YhHLHnOwhjnl770pczNzeWd73znQ8de//rX54477njoz2KXX355Hve4x+WWW245vyIBAIDOWrdh7+DBEx29k83NJYcOnftn33///Xn+85+fH/mRH8kP/dAPnfH8iy66KNdee20+/OEPD/T5D3/4w3PbbbflzW9+c97xjnece6EAAEBnrdt79rZu7d2jt1Tgm5hItmw5t89trWXnzp25+uqrc+ONNw58zUc+8pFTxjWXc/nll+e9731vpqenc9lll+VZz3rWuRUMAAB00rrt7M3MJBtO89tv2NA7fi7+5E/+JG9729vyoQ996KH77fbs2bPkuQv37F1zzTV54IEH8uIXv/isvuuqq67KrbfemhtuuCEf+9jHzq1gAACgk6q1NuoaztnU1FQ7cODA163dddddufrqqwe6fqndODdsOL/dONeSs/m7AAAAxk9V3d5am1rq2Lod40x6ge7o0d5mLIcO9UY3Z2aS/vPLAQAAxta6DntJL9jt3DnqKgAAAFbWur1nDwAAWF+md01netf0qMtYNcIeAABABwl7AAAAHbTu79kDAAC6a/HY5r7D+05Z23v93tUtaBXp7K2we++9N0996lPzHd/xHXnyk5+c17zmNUue99rXvjabN2/Otm3b8sQnPjEvetGLMj8/nyS5/vrr8653vevrzt/Y3yL07rvvzjXXXPPQ+lve8pY85SlPyZe//OUh/UYAAMA40tnLiWS/Eqn+YQ97WD70oQ9l48aNuf/++7N9+/Y8+9nPztOf/vRTzn35y1+en/7pn878/Hy++7u/O/v27csznvGMgb/rbW97W970pjflQx/6UC699NLzrh0AALpm8X/jr+R/948DYW+FVdVDXbj7778/999/f6pq2Wu+9rWv5d577z2rwHbLLbfkpptuygc/+MFcdtll51UzAADQPcY4h+DBBx/Mtm3b8uhHPzrPfOYz87SnPW3J8974xjdm27Zt2bRpU57whCdk27ZtDx17xStekW3btj30Z7HDhw/npS99ad73vvflMY95zFB/FwAAYDyt287eMG/UvOCCC3LHHXfkK1/5Sp73vOflU5/61NfdZ7dgYYzz/vvvzwte8IK8853vzHXXXZckef3rX58XvOAFD5270C1MkssvvzyPfOQjc8stt+TlL3/5OdcJAADryXoZ31ygszdEl1xySaanp/Pe97532fMuuuiiXHvttfnwhz880Oc+/OEPz2233ZY3v/nNecc73rESpQIAAB2zbjt7w7pR89ixY7noootyySWX5G//9m/zgQ98IK985SuXvaa1lo985COnjGsu5/LLL8973/veTE9P57LLLsuznvWs8y0dAADoEJ29FfaFL3whz3jGM/Lt3/7t+c7v/M4885nPzA/8wA8see7CPXvXXHNNHnjggbz4xS8+q++66qqrcuutt+aGG27Ixz72sZUoHwAA6IhqrY26hnM2NTXVDhw48HVrd911V66++uqz+pyubsF6Ln8XAADA+Kiq21trU0sdW7djnIt1LeQBAAAY4wQAAOigToa9cR5NXSn+DgAAYH3rXNi7+OKLc88996zrsNNayz333JOLL7541KUAADAE07umv+4Z0bCUzt2zd8UVV+TIkSM5duzYqEsZqYsvvjhXXHHFqMsAAABGpHNh76KLLspVV1016jIAAABGqnNhDwAAumjx2Oa+w/tOWbPDPCfr3D17AAAA6OwBAMBYWNy5W+jo6eaxHJ09AACADhL2AAAAOsgYJwAAjBnjmwxCZw8AAMaMh6ozCGEPAACgg4Q9AACADnLPHgAAjAEPVeds6ewBAAB0kM4eAACMAQ9V52zp7AEAAHSQsAcAANBBxjgBAGDMGN9kEDp7AAAAHSTsAQAAdJCwBwAAsIzpXdNf90zDcSHsAQAAdJCwBwAA0EF24wQAADjJ4rHNfYf3nbI2Djui6uwBAAB0kM4eAABwVhY6XOPQ3TpXi3+3cf19h97Zq6oLqurPq+o9/fePrKr3V9XB/uuli859dVUdqqrPVNWzhl0bAABAV63GGOdPJrlr0ftXJflga21rkg/236eqnpTkuiRPTnJtkl+vqgtWoT4AAFgXxvURApyboY5xVtUVSf63JL+Y5Mb+8nOTTPd/fmuSvUle2V9/Z2vtviSfrapDSZ6a5KPDrBEAADizLmxYcq7G9XcbdmfvV5P8qyTzi9a+ubX2hSTpvz66v745yecWnXekv/Z1quqFVXWgqg4cO3ZsOFUDAACMuaF19qrqB5J8qbV2e1VND3LJEmvtlIXWbk5yc5JMTU2dchwAADhhpTpyXdiwZL0Z5hjndyV5TlXtSHJxkkdU1duTfLGqNrXWvlBVm5J8qX/+kSSPXXT9FUmODrE+AACAzhpa2GutvTrJq5Ok39n76dbaj1bV65P8WJKb+q/v7l9ya5LfqapfSfItSbYm+fiw6gMAgPVAR279GsVz9m5KcktV7Uzyl0l+OElaa5+uqluS3JnkgSQvaa09OIL6AACAZQiL42FVwl5rbW96u26mtXZPku87zXm/mN7OnQAAwArQzVu/RtHZAwAARkDgW19W46HqAAAArDKdPQAA6Jj1/AB0TtDZAwAA6CCdPQAA6BiPWyDR2QMAAOgkYQ8AAKCDjHECAECHGd9cv3T2AAAAOkjYAwAA6CBhDwAAoIOEPQAAgA4S9gAAADpI2AMAAOggYQ8AAKCDhD0AAIAOEvYAAAA6SNgDAADoIGEPAACgg4Q9AACADhL2AAAAOkjYAwAA6CBhDwAAoIOEPQAAgA4S9gAAADpI2AMAAOggYQ8AAKCDhD0AAIAOEvYAAAA6SNgDAADoIGEPAACgg4Q9AACADhL2AABgxKZ3TWd61/Soy6BjhD0AAIAOEvYAAAA66MJRFwAAAOvR4rHNfYf3nbK29/q9q1sQnaOzBwAA0EE6ewAAMAKLO3cLHb1Bu3lnez7rk84eAABABwl7AAAAHWSMEwAARmyQcUwbunC2dPYAAAA6SGcPAIB1ZVw3NzmfDV1Yn3T2AAAAOkjYAwAA6CBjnAAAdF7XNjcZt3oZDZ09AACADtLZAwCg82xuwnqkswcAANBBwh4AAEAHGeMEAGBdMb7JeqGzBwAA0EHCHgAAQAcJewAAAB0k7AEAAHSQsAcAANBBwh4AAEAHCXsAANBh07umM71retRlMALCHgAAQAcJewAAAB104agLAAAAVtbisc19h/edsrb3+r2rWxAjobMHAADQQTp7AADQMYs7dwsdPd289UdnDwAAoIOEPQAAgA4yxgkAAB1mfHP90tkDAADoIGEPAACgg4Q9AACADhL2AAAAOkjYAwAA6CBhDwAAoIOEPQAAgA4S9gAAgDVtetd0pndNj7qMsTO0sFdVF1fVx6vqL6rq01X1c/3111bV56vqjv6fHYuueXVVHaqqz1TVs4ZVGwAAQNddOMTPvi/J97bWjlfVRUn2V9Vt/WNvbK398uKTq+pJSa5L8uQk35LkA1X1hNbag0OsEQAAoJOGFvZaay3J8f7bi/p/2jKXPDfJO1tr9yX5bFUdSvLUJB8dVo0AAMDatHhsc9/hfaes7b1+7+oWNIaGes9eVV1QVXck+VKS97fWPtY/9NKq+kRV/YequrS/tjnJ5xZdfqS/dvJnvrCqDlTVgWPHjg2zfAAAgLE1zDHO9Ecwt1XVJUn+oKquSfIbSX4hvS7fLyR5Q5IbktRSH7HEZ96c5OYkmZqaWq5TCAAAjKnFnbuFjp5u3tlZld04W2tfSbI3ybWttS+21h5src0neUt6o5pJr5P32EWXXZHk6GrUBwAA0DXD3I3z8n5HL1X1jUm+P8l/qapNi057XpJP9X++Ncl1VfWwqroqydYkHx9WfQAAAF02zDHOTUneWlUXpBcqb2mtvaeq3lZV29Ib0bw7yY8nSWvt01V1S5I7kzyQ5CV24gQAAIxvnpvqbZo5nqamptqBAwdGXQYAAMBIVNXtrbWppY6tyj17AAAArC5hDwAAoIOEPQAAgA4S9gAAADpI2AMAAOggYQ8AAKCDhD0AAIAOEvYAAGDEpndNZ3rX9KjLWLP8/ZwbYQ8AAKCDhD0AAIAOunDUBQAAwHq0eCxx3+F9p6ztvX7v6ha0xvj7OX86ewAAAB1UrbVR13DOpqam2oEDB0ZdBgAAnJeFjpVu1dL8/ZxeVd3eWpta6pjOHgAAQAcJewAAAB1kjBMAAGBMGeMEAABYZ4Q9AAAYE9O7pr/u8QOwHGEPAACgg4Q9AACADrpw1AUAAACnt3hsc9/hfaesefYcp6OzBwAA0EE6ewAAsIYt7twtdPR08xiEzh4AAHDW7Ay69gl7AAAAHWSMEwAAxoTxTc6GsAcAAAzEzqDjxRgnAABAB+nsAQAAA7Ez6HjR2QMAAOggYQ8AAKCDjHECAABnzfjm2qezBwAA0EHCHgAAQAcJewAAAB0k7AEAAHSQsAcAANBBwh4AAEAHCXsAAAAdJOwBAAB0kLAHAAAjNr1rOtO7pkddBh0j7AEAAHSQsAcAANBBF466AAAAWI8Wj23uO7zvlLW91+9d3YLoHJ09AACADtLZAwCAEVjcuVvo6OnmsZJ09gAAADpI2AMAAOggY5wAADBixjcZBp09AACADhL2AAAAOkjYAwAA6KCBw15VTVTVBcMsBgAAgJVx2rBXVRuq6h9X1X+uqi8l+S9JvlBVn66q11fV1tUrEwAAgLOxXGfvj5N8a5JXJ3lMa+2xrbVHJ/kHSf40yU1V9aOrUCMAAABnablHL3x/a+3+kxdba3+d5PeS/F5VXTS0ygAAADhnp+3sLQS9qvrWqnpY/+fpqvqJqrpk8TkAAACsLYNs0PJ7SR6sqi1JfivJVUl+Z6hVAQAAcF4GCXvzrbUHkjwvya+21l6eZNNwywIAAOB8DBL27q+qf5Tkx5K8p7/mXj0AAIA1bJCw90+T/K9JfrG19tmquirJ24dbFgAAAOdjud04kySttTur6pVJHtd//9kkNw27MAAAAM7dGTt7VfWDSe5I8t7++21VdeuwCwMAAODcDTLG+dokT03ylSRprd2R3o6cAAAArFGDhL0HWmtfPWmtDaMYAAAAVsYZ79lL8qmq+sdJLqiqrUl+IslHhlsWAAAA52OQzt6/TPLkJPel9zD1ryZ52TCLAgAA4PwMshvn/0zyr6vql1prc6tQEwAAAOdpkN04/35V3Znkrv7776iqXx96ZQAAAJyzQcY435jkWUnuSZLW2l8k+e5hFgUAAMD5GSTspbX2uZOWHhxCLQAAAKyQQXbj/FxV/f0kraq+Ib3dOO8ablkAAACcj0E6e/8iyUuSbE7y+STb+u8BAABYowbZjfOvkvzIKtQCAADAChlkN87HV9UfVtWxqvpSVb27qh4/wHUXV9XHq+ovqurTVfVz/fVHVtX7q+pg//XSRde8uqoOVdVnqupZ5/erAQAArF+DjHH+TpJbkmxK8i1JfjfJfxzguvuSfG9r7TvSG/28tqqenuRVST7YWtua5IP996mqJyW5Lr0HuF+b5Ner6oKz+3UAAABIBgt71Vp7W2vtgf6ftydpZ7qo9Rzvv72o/6cleW6St/bX35rkH/Z/fm6Sd7bW7mutfTbJoSRPPYvfBQAAgL5Bwt4fV9WrqurKqvo7VfWvkvzn/jjmI5e7sKouqKo7knwpyftbax9L8s2ttS8kSf/10f3TNydZ/IiHI/21kz/zhVV1oKoOHDt2bIDyAQAA1p9BHr0w03/98ZPWb0ivU3fa+/daaw8m2VZVlyT5g6q6ZpnvqaU+YonPvDnJzUkyNTV1xg4jAADAejTIbpxXne+XtNa+UlV707sX74tVtam19oWq2pRe1y/pdfIeu+iyK5IcPd/vBgAAWI/OGPaq6p8std5a++0zXHd5kvv7Qe8bk3x/kv8zya1JfizJTf3Xd/cvuTXJ71TVr6S3EczWJB8f8PcAAABgkUHGOL9z0c8XJ/m+JH+WZNmwl97unW/t76i5IcktrbX3VNVHk9xSVTuT/GWSH06S1tqnq+qWJHcmeSDJS/pjoAAAAJylau3sbnurqm9K8rbW2nOGU9Lgpqam2oEDB0ZdBgAAwEhU1e2ttamljg2yG+fJ/md6I5YAAACsUYPcs/eHObEr5oYkT0rvweoAAACsUYPcs/fLi35+IMnh1tqRIdUDAADAClg27PU3V/l0a+2v+u+/Icn1VfXy1trVq1EgAAAAZ++09+xV1XVJ/jrJJ6pqX1U9I8l/T/LsJD+ySvUBAABwDpbr7P1skv+ltXaoqp6S5KNJrmut/cHqlAYAAMC5Wm43zq+11g4lSWvtz5J8VtADAAAYD8t19h5dVTcuer9x8fvW2q8MrywAAADOx3Jh7y1JJpd5DwAAwBp12rDXWvu51SwEAACAlTPIc/YA6LjZ2WT37uTgwWTr1mRmJpk0ywEAY03YA1jn9u9PduxI5ueTublkYiK58cZkz55k+/ZRVwcAnKvlduMEoONmZ3tBb3a2F/SS3uvC+vHjo60PADh3Z+zsVdXDkjw/yZWLz2+t/fzwygJgNeze3evoLWV+vnd8587VrQkAWBmDjHG+O8lXk9ye5L7hlgPAajp48ERH72Rzc8mhQ6tbDwCwcgYJe1e01q4deiUArLqtW3v36C0V+CYmki1bVr8mAGBlDHLP3keq6u8OvRIAVt3MTLLhNP8m2LChdxwAGE+DhL3tSW6vqs9U1Seq6pNV9YlhFwbA8E1O9nbdnJzsdfKS3uvC+saNo60PADh3g4xxPnvoVQAwMtu3J0eP9jZjOXSoN7o5MyPoAcC4O2PYa60dTpKqenSSi4deEQCrbuNGu24CQNeccYyzqp5TVQeTfDbJviR3J7ltyHUBAABwHga5Z+8Xkjw9yX9trV2V5PuS/MlQqwIAAOC8DBL27m+t3ZNkQ1VtaK39cZJtQ64LAACA8zDIBi1fqaqNST6c5B1V9aUkDwy3LAAAAM7HIJ295yb52yQvT/LeJP8tyQ8OsygAAADOzyC7cc4tevvWIdYCAADACjlt2Kuq/a217VU1m6QtPpSktdYeMfTqAAAAOCenDXutte3918nVKwcAAICVsFxn75HLXdha++uVLwcAAICVsNw9e7enN75ZSR6X5Mv9ny9J8pdJrhp6dQAAAJyT0+7G2Vq7qrX2+CR/lOQHW2uXtdYeleQHkvz+ahUIAADA2Rvk0Qvf2Vrbs/CmtXZbku8ZXkkAAACcr0Eeqv5XVfWzSd6e3ljnjya5Z6hVAbCqZmeT3buTgweTrVuTmZlk0vZcADDWBgl7/yjJa5L8Qf/9h/trAHTA/v3Jjh3J/HwyN5dMTCQ33pjs2ZNs3z7q6gCAczXIQ9X/OslPrkItAKyy2dle0JudPbE2N9d73bEjOXo02bhxNLUBAOfnjPfsVdUTqurmqnpfVX1o4c9qFAfAcO3e3evoLWV+vnccABhPg4xx/m6SNyf5zSQPDrccAFbTwYMnOnknm5tLDh1a3XoAgJUzSNh7oLX2G0OvBIBVt3Vr7x69pQLfxESyZcvq1wQArIxBHr3wh1X14qraVFWPXPgz9MoAGLqZmWTDaf5NsGFD7zgAMJ4G6ez9WP/1FYvWWpLHr3w5AKymycnerpsn78a5YUNv3eYsADC+BtmN86rVKASA0di+vbfr5u7dvXv0tmzpdfQEPQAYb6cNe1X1Q8td2Fr7/ZUvB4BR2Lgx2blz1FUAACtpuc7eDy5zrCUR9gAAANao04a91to/Xc1CALpoetd0kmTv9XtHWgcAsP4MshsnAAAAY0bYAwAA6KBBHr0AwFlYGN1Mkn2H952yZqQTAFgNA4W9qvr7Sa5cfH5r7beHVBMAAADn6Yxhr6reluRbk9yR5MH+cksi7AEsYXHnzgYtAMCoDNLZm0rypNZaG3YxAAAArIxBNmj5VJLHDLsQAAAAVs5pO3tV9YfpjWtOJrmzqj6e5L6F46215wy/PIDxZnwTABiV5cY4f3nVqgAAAGBFnTbstdb2JUlVPbu1dtviY1X1L5LsG3JtAGPPBi09s7PJ7t3JwYPJ1q3JzEwyOTnqqgCg2wbZoOXfVNV9rbUPJUlVvTLJdJI3D7MwALph//5kx45kfj6Zm0smJpIbb0z27Em2bx91dQDQXYOEveckeU9VvSLJtUme2F8DgGXNzvaC3uzsibW5ud7rjh3J0aPJxo2jqQ0Auu6MYa+19ldV9ZwkH0hye5IXeAwDwOktjG4myb7D+05ZW08jnbt39zp6S5mf7x3fuXN1awKA9WK53Thn09uNc8E3JHl8khdUVWutPWLYxQEw3g4ePNHJO9ncXHLo0Ll/tvshAWB5y23Q4tZ5gHOwOHys90CydWvvHr2lAt/ERLJly+rXBADrxSAPVU9VXVpVT62q7174M+zCABh/MzPJhtP8m2bDht5xAGA4znjPXlX9syQ/meSKJHckeXqSjyb53uGWBsC4m5zs7bp58m6cGzb01s92cxb3QwLA4AbZjfMnk3xnkj9trT2jqp6Y5OeGWxZANwgfvccrHD3a24zl0KHe6ObMjF04AWDYBgl797bW7q2qVNXDWmv/paq+beiVAdAZGzeuzK6b7ocEgMENEvaOVNUlSf5TkvdX1ZeTHB1uWQAAAJyPQZ6z97z+j6+tqj9O8k1JbhtqVQAAAJyXQTp7D2mt7UuSqvrLJI8bSkUAMADjmwCwvIEevbCEWvgjisgAAB2WSURBVNEqAAAAWFHnGvbailYBAADAijrtGGdV3Xi6Q0lsmA0AALCGLXfP3uQyx35tpQsBAABg5Zw27LXWPDgdAABgTJ32nr2q+tmqunSZ499bVT8wnLIAumF61/RDD/8GAFhNy41xfjLJe6rq3iR/luRYkouTbE2yLckHkvzS0CsEAADgrC03xvnuJO+uqq1JvivJpiR/k+TtSV7YWvvb1SkRAE610DH1vD0AWNoZH6reWjuY5ODZfnBVPTbJbyd5TJL5JDe31n6tql6b5J+n1ylMkp9pre3pX/PqJDuTPJjkJ1prf3S23wswaovHNvcd3nfKmnACAKyGM4a98/BAkp9qrf1ZVU0mub2q3t8/9sbW2i8vPrmqnpTkuiRPTvItST5QVU9orT04xBoBAAA6aWhhr7X2hSRf6P88W1V3Jdm8zCXPTfLO1tp9ST5bVYeSPDXJR4dVI8AwLO7cGTVcWbqmADC40+7GuZKq6sokfy/Jx/pLL62qT1TVf1i04+fmJJ9bdNmRLBEOq+qFVXWgqg4cO3bs5MMAAABkgM5eVT0hyW8k+ebW2jVV9e1JntNa+3eDfEFVbUzye0le1lr7m6r6jSS/kKT1X9+Q5IYktcTl7ZSF1m5OcnOSTE1NnXIcgO7SNQWAwQ3S2XtLklcnuT9JWmufSO/eujOqqovSC3rvaK39fv/6L7bWHmytzfc/+6n9048keeyiy69IcnSQ7wFYq/Zev1cYAQBGYpCw9/DW2sdPWnvgTBdVVSX5rSR3tdZ+ZdH6pkWnPS/Jp/o/35rkuqp6WFVdld7z/E7+XgAAAAYwyAYtf1VV35r+SGVVvSD9jVfO4LuS/B9JPllVd/TXfibJP6qqbf3PuzvJjydJa+3TVXVLkjvTC5MvsRMnMO6MGg6Pv1MAWN4gYe8l6d0j98Sq+nySzyb50TNd1Frbn6Xvw9uzzDW/mOQXB6gJAACAZQzyUPX/nuT7q2oiyYbW2uzwywIAAOB8DLIb5y8leV1r7Sv995em97D0nx12cQDjyLPgAIC1YJANWp69EPSSpLX25SQ7hlcSAAAA52uQe/YuqKqHtdbuS5Kq+sYkDxtuWQDjy7PgAIC1YJCw9/YkH6yq/ye9HTRvSPLWoVYFAADAeRlkg5bXVdUnk3xfertr/kJr7Y+GXhlAB9zxP+4480kAAEMwSGcvrbXbktw25FoAOmfbY7aNugQAYJ06bdirqv2tte1VNZv+A9UXDiVprbVHDL06ADphdjbZvTs5eDDZujWZmUkmJ8/vM90PCZwr//xgvTht2Gutbe+/nue/jgHWF49e+Hr79yc7diTz88ncXDIxkdx4Y7JnT7J9+6irA4DuWvbRC1W1oao+tVrFANAts7O9oDc72wt6Se91Yf348dHWBwBdtuw9e621+ar6i6p6XGvtL1erKIBx5tELJ+ze3evoLWV+vnd8587BP0/XFDhX/vnBejTIBi2bkny6qj6eZG5hsbX2nKFVBUAnHDx4oqN3srm55NCh1a0HANaTQcLezw29CgA6aevW3j16SwW+iYlky5az+zxdU+Bc+ecH69Fyu3FenORfJNmS5JNJfqu19sBqFQbQBev9PyRmZnqbsSxlw4becQBgOJbboOWtSabSC3rPTvKGVakIgM6YnOztujk52evkJb3XhfWNG0dbHwB0WbXWlj5Q9cnW2t/t/3xhko+31p6ymsWdydTUVDtw4MCoywDgDI4f723GcuhQb3RzZkbQA4CVUFW3t9amljq23D179y/80Fp7oKpWvDAA1oeNG89u100A4PwtF/a+o6r+pv9zJfnG/vtK0lprjxh6dQAAAJyT04a91toFq1kIAAAAK2e5DVoAAAAYU8IeAABABwl7AAAAHSTsAQAAdJCwBzBE07umM71retRlAADrkLAHAADQQcs9Zw+A83TH/7hj1CUAAOuUsAewwhaPbX71vq+esrb3+r2rWxAAsC4Z4wQAAOggYQ8AAKCDhD0AAIAOcs8ewApbfE/eJTddcsoaAMBqEPYAhmjbY7aNugQAYJ0S9lizZmeT3buTgweTrVuTmZlkcnLUVQEAwHio1tqoazhnU1NT7cCBA6MugyHYvz/ZsSOZn0/m5pKJiWTDhmTPnmT79lFXB9218IgIY6cAMB6q6vbW2tRSx2zQwpozO9sLerOzvaCX9F4X1o8fH219AAAwDoQ91pzdu3sdvaXMz/eOw7iY3jX9dQ9UBwBYLe7ZY805ePBER+9kc3PJoUOrWw903eIwuu/wvlPWjHQCwHjS2WPN2bq1d4/eUiYmki1bVrceAAAYRzZoYc2ZnU02b+69nmxyMjl6NNm4cfXrgkEt1Sn7nr/zPQ+treVOmQ1aAGC82KCFsTI52dt1c3LyRIdvYuLEuqAHAABn5p491qTt23sdvN27e/fobdnSe86eoMc4WNwVu/DnLzxlDQBgNQh7rFkbNyY7d466ClhfhFIA6A5jnAAAAB2kswewwi656ZKHfn6wPXjK2lde9ZVVrwkAWH909gAAADpIZw9ghS3u3C109HTzAIDVprMHAADQQcIeAABABxnjBBgi45sAwKjo7AEAAHSQsAcAANBBwh4AAEAHCXsAAAAdJOwBAAB0kLAHAADQQcIeAABABwl7AAAAHSTsAQAAdJCwBwAA0EHCHgAAQAcJewAAAB0k7AEAAHSQsAcAANBBwh4AAEAHCXsAAAAdJOwBAAB0kLAHAADQQcIeAABABwl7AAAAHSTsAQAAdJCwBwAA0EHCHgAAQAcJewAAAB104agL4MxmZ5Pdu5ODB5OtW5OZmWRyctRVAeNietd0kmTv9XtX5ToAYG0YWmevqh5bVX9cVXdV1aer6if764+sqvdX1cH+66WLrnl1VR2qqs9U1bOGVds42b8/2bw5ednLkte9rve6eXNvHQAA4HSGOcb5QJKfaq1dneTpSV5SVU9K8qokH2ytbU3ywf779I9dl+TJSa5N8utVdcEQ61vzZmeTHTt6r3NzvbW5uRPrx4+Ptj4AAGDtGtoYZ2vtC0m+0P95tqruSrI5yXOTTPdPe2uSvUle2V9/Z2vtviSfrapDSZ6a5KPDqnGt2707mZ9f+tj8fO/4zp2rWxMwHhZGMJNk3+F9p6ydbjTzXK8DANaeVdmgpaquTPL3knwsyTf3g+BCIHx0/7TNST636LIj/bWTP+uFVXWgqg4cO3ZsmGWP3MGDJzp6J5ubSw4dWt16AACA8TH0DVqqamOS30vystba31TVaU9dYq2dstDazUluTpKpqalTjnfJ1q3JxMTSgW9iItmyZfVrAsbD4g7c2Wy0cq7XAQBrz1A7e1V1UXpB7x2ttd/vL3+xqjb1j29K8qX++pEkj110+RVJjg6zvrVuZibZcJr/hTZs6B0HAABYyjB346wkv5Xkrtbaryw6dGuSH+v//GNJ3r1o/bqqelhVXZVka5KPD6u+cTA5mezZ03udmOitTUycWN+4cbT1AQAAa1e1NpxJyKranuT/TfLJJAvbjPxMevft3ZLkcUn+MskPt9b+un/Nv05yQ3o7eb6stXbbct8xNTXVDhw4MJT615Ljx3ubsRw61BvdnJkR9AAAgKSqbm+tTS15bFhhbzWsl7AHAACwlOXC3qrsxgkAAMDqEvYAAAA6SNgDAADoIGEPAACgg4Q9AACADhL2AAAAOkjYAwAA6CBhD4CRm941neld06MuAwA6RdgDAADoIGEPAACggy4cdQEArE+Lxzb3Hd53ytre6/eubkEA0DE6ewAAAB2kswfASCzu3C109HTzAGDl6OwBAAB0kLAHAADQQcY4YYzMzia7dycHDyZbtyYzM8nk5KirgvNnfBMAVp6wB2Ni//5kx45kfj6Zm0smJpIbb0z27Em2bx91dQAArDXGOGEMzM72gt7sbC/oJb3XhfXjx0dbHwAAa4+wB2Ng9+5eR28p8/O94wAAsJiwB2Pg4METHb2Tzc0lhw6tbj0AAKx9wh6Mga1be/foLWViItmyZXXrAQBg7RP2YAzMzCQbTvP/rRs29I4DAMBiwh6MgcnJ3q6bk5MnOnwTEyfWN24cbX0AAKw9Hr0AY2L79uTo0d5mLIcO9UY3Z2YEPQAAlibswRjZuDHZuXPUVQAAMA6McQIAAHSQsAcAANBBwh4AAEAHCXsAAAAdJOwBAAB0kLAHAADQQcIewICmd01netf0qMsAABiIsAcAANBBwh4AAEAHXTjqAgDWssVjm/sO7ztlbe/1e1e3IACAAensAQAAdJDOHsAyFnfuFjp6unkAwDjQ2QMAAOggYQ8AAKCDjHECDMj4JgAwToQ9WMLsbLJ7d3LwYLJ1azIzk0xOjrqqtVsXAABrT7XWRl3DOZuammoHDhwYdRl0zP79yY4dyfx8MjeXTEwkGzYke/Yk27erCwCAtaOqbm+tTS15TNiDE2Znk82be68nm5xMjh5NNm5UFwAAa8NyYc8GLbDI7t29ztlS5ud7x0dhrdYFAMDaJezBIgcP9kYklzI3lxw6tLr1LFirdQEAsHYJe7DI1q29e+GWMjGRbNmyuvUsWKt1AQCwdgl7sMjMTG/Tk6Vs2NA7PgprtS4AANYuYQ8WmZzs7W45OXmikzYxcWJ9VJugrNW6AABYuzxnD06yfXtvd8vdu3v3wm3Z0uucjTpQbd+efOYzyate1Xv9tm9Lbrop2bRptHUBALA2efQCjAnP2QMA4GQevQBjbna2F/RmZ0/syjk3d2L9+PHR1gcAwNoj7K2g2dnkN38zeeUre69LPQAbzoXn7AEAcLbcs7dClhqxu/FGI3asDM/ZAwDgbOnsrQAjdgyb5+wBAHC2hL0VYMSOYfOcPQAAzpawtwKM2DFsnrMHAMDZcs/eClgYsVsq8BmxY6Ws1ef/AQCwNnnO3gqYnU02b156983Jyd5/oPsPcgAAYKV5zt6QGbEDAADWGmOcK8SIHQAAsJYIeyto48Zk585RVwEAAGCMEwAAoJOEPQAAgA4S9gAAADpI2AMAAOggYQ8AAKCDhD0AAIAOEvYAAAA6SNgDAADoIGEPAACgg4Q9AACADhL2AAAAOkjYAwAA6KALR10AZzY7m+zenRw8mGzdmszMJJOTo64K1pfpXdNJkr3X7x1pHQAAgxpaZ6+q/kNVfamqPrVo7bVV9fmquqP/Z8eiY6+uqkNV9Zmqetaw6ho3+/cnmzcnL3tZ8rrX9V43b+6tAwAAnM4wxzh3Jbl2ifU3tta29f/sSZKqelKS65I8uX/Nr1fVBUOsbSzMziY7dvRe5+Z6a3NzJ9aPHx9tfQAAwNo1tDHO1tqHq+rKAU9/bpJ3ttbuS/LZqjqU5KlJPjqk8sbC7t3J/PzSx+bne8d37lzdmmA9WRjdTJJ9h/edsmakEwBYy0axQctLq+oT/THPS/trm5N8btE5R/prp6iqF1bVgao6cOzYsWHXOlIHD57o6J1sbi45dGh16wEAAMbHam/Q8htJfiFJ67++IckNSWqJc9tSH9BauznJzUkyNTW15DldsXVrMjGxdOCbmEi2bFn9mmA9Wdy5s0ELADBuVrWz11r7YmvtwdbafJK3pDeqmfQ6eY9ddOoVSY6uZm1r0cxMsuE0/wtt2NA7DgAAsJRVDXtVtWnR2+clWdip89Yk11XVw6rqqiRbk3x8NWtbiyYnkz17eq8TE721iYkT6xs3jrY+AABg7RraGGdV/cck00kuq6ojSV6TZLqqtqU3onl3kh9Pktbap6vqliR3JnkgyUtaaw8Oq7Zxsn17cvRobzOWQ4d6o5szM4IerDbjmwDAuKnWxve2t6mpqXbgwIFRlwEAADASVXV7a21qqWOj2I0TAACAIRP2AAAAOkjYAwAA6CBhDwAAoIOEPQAAgA4S9gAAADpI2AMAAOggYQ8AAKCDhD0AAIAOEvYAAAA6SNgDAADoIGEPAACgg4Q9AACADhL2AAAAOkjYAwAA6CBhDwAAoIOEPQAAgA4S9gAAADpI2AMAAOggYQ8AAKCDhD0AAIAOEvYAAAA6SNgDAADooAtHXQCMyuxssnt3cvBgsnVrMjOTTE4uf830rukkyd7r9w69PgAAOB/CHuvS/v3Jjh3J/HwyN5dMTCQ33pjs2ZNs3z7q6gAA4PwZ42TdmZ3tBb3Z2V7QS3qvC+vHj4+2PgAAWAk6e6w7u3f3OnpLmZ/vHd+588Tawuhmkuw7vO+UNSOdAACsRTp7rDsHD57o6J1sbi45dGh16wEAgGHQ2eOcnMvmJmvF1q29e/SWCnwTE8mWLV+/trhzZ4MWAADGhc4eZ23//mTz5uRlL0te97re6+bNvfVxMDOTbDjN/+Vv2NA7DgAA407Y46x0YXOTycnerpuTk71OXtJ7XVjfuHG09QEAwEowxslZOdvNTdaq7duTo0d79R461BvdnJk5c9AzvgkAwLgQ9jgrXdrcZOPG8QimAABwLoxxclYWNjdZylKbmwAAAKMh7HFWbG4CAADjQdjjrNjcBAAAxoN79jhr57q5CQAAsHqEPc6JzU0AAGBtM8YJAADQQcIeAABABwl7AAAAHSTsAQAAdJCwBwAA0EHCHgAAQAcJewAAAB0k7AEAAHSQsAcAANBBwh4AAEAHCXsAAAAdJOwBAAB0kLAHAADQQcIeAABABwl7AAAAHSTs8f+3d/+xV9V1HMefr/EVyFBUECNkfsk0Q3PEr+ky9YsN0zl/bMw0S8lqjTZDNzQdrdFapmJZxPJH6fyRP9JhaW6GyoA2pyB++fGFFH8EDBRTskwqTerdH+dzx/F6z+VKfu899/J6bGffcz/nfM/5nPvifr+fN+fH18zMzMzMOpCLPTMzMzMzsw7kYs/MzMzMzKwDudgzMzMzMzPrQIqIVvdht0l6DdjU6n400XBgW6s7Ye+LM2s/zqy9OK/248zajzNrP86s/fw/mR0SEQfWWtDWxd6eRtKKiJjY6n5Y45xZ+3Fm7cV5tR9n1n6cWftxZu2nvzLzZZxmZmZmZmYdyMWemZmZmZlZB3Kx115uanUH7H1zZu3HmbUX59V+nFn7cWbtx5m1n37JzPfsmZmZmZmZdSCf2TMzMzMzM+tALvbMzMzMzMw6kIu9FpN0i6RXJa3Ntc2V9KykNZJ+I2m/3LIrJL0gab2kk3PtEyT1pWXzJKnZx7InqJVXbtksSSFpeK7NebVYUWaSLkq5rJN0Ta7dmbVYwc/FcZKelLRK0gpJk3PLnFkLSRotabGkZ9LnaWZqP0DSo5KeT1/3z32PM2uhOpl5/FFSRZnllnsMUjL1MmvqGCQiPLVwAo4HxgNrc21Tga40fzVwdZofC6wGBgFjgBeBAWnZcuBYQMDDwCmtPrZOnGrlldpHAwuBTcBw51WeqeAz1gM8BgxKr0c4s/JMBZk9UnnPgVOBJc6sHBMwEhif5vcBnku5XANcntov9++y8kx1MvP4o6RTUWbptccgJZzqfM6aOgbxmb0Wi4g/AK9XtT0SETvSyyeBg9P8GcA9EfF2RGwAXgAmSxoJ7BsRT0T2L+J24MzmHMGepVZeyXXAZUD+iUfOqwQKMpsBXBURb6d1Xk3tzqwECjILYN80PxR4Oc07sxaLiK0R0Zvm3wSeAUaRZXNbWu02dr7/zqzFijLz+KO86nzOwGOQUqqTWVPHIC72yu9Csgoesn8gm3PLtqS2UWm+ut2aQNLpwEsRsbpqkfMqr8OBz0paJmmppEmp3ZmV18XAXEmbgWuBK1K7MysRSd3Ap4FlwEERsRWyQQ8wIq3mzEqkKrM8jz9KKp+ZxyDtoepz1tQxSNfud9v6m6TZwA7gzkpTjdWiTrv1M0l7A7PJLn15z+Iabc6rHLqA/YFjgEnAvZI+hjMrsxnAJRGxQNLZwM3A53BmpSFpCLAAuDgi/l7nlhJnVhLVmeXaPf4oqXxmZBl5DFJyNX42NnUM4jN7JSXpAuA04Lx0yhaySn50brWDyS5l2sLOSy3y7db/DiW7rnq1pI1k732vpI/gvMpsC3B/ZJYD/wWG48zK7ALg/jR/H1B5QIszKwFJe5ENZu6MiEpOf06XH5G+Vi5VcmYlUJCZxx8lViMzj0FKruBz1tQxiIu9EpL0eeDbwOkR8c/cogeBcyQNkjQGOAxYni6PeVPSMenpPOcDDzS943ugiOiLiBER0R0R3WQfyPER8QrOq8x+C0wBkHQ4MBDYhjMrs5eBE9L8FOD5NO/MWiy9vzcDz0TEj3OLHiQr0klfH8i1O7MWKsrM44/yqpWZxyDlVudnY3PHII0+ycVTvz2p525gK/AO2Yf0q2Q3ZG4GVqXphtz6s8mezrOe3JN4gInA2rRsPqBWH1snTrXyqlq+kfQkLOdVjqngMzYQ+FXKoBeY4szKMxVkdhzwNNmTypYBE5xZOaaUTQBrcr+3TgWGAYvICvNFwAHOrBxTncw8/ijpVJRZ1Toeg5RoqvM5a+oYRGkDZmZmZmZm1kF8GaeZmZmZmVkHcrFnZmZmZmbWgVzsmZmZmZmZdSAXe2ZmZmZmZh3IxZ6ZmZmZmVkHcrFnZmb9StIwSavS9Iqkl3KvB1ate7GkvRvY5hJJE2u0nyZppaTVkv4o6Rsf5LHsLklzqo77qt3Yxn6SvrmLdc6SFJKO2P3emplZp/CfXjAzs6aRNAfYHhHXFizfCEyMiG272M4SYFZErMi17QVsAiZHxBZJg4DuiFj/AXW/Vj+6ImJHA+vNoc5xN7ivbuChiDiqzjr3AiOBRRExp8byARHxn93tg5mZtRef2TMzs6aTdFI6A9cn6RZJgyR9C/gosFjS4rTe9ZJWSFon6Xu72Ow+QBfwF4CIeLtS6EkaI+kJSU9J+r6k7an9REkP5fo1X9L0NP/dtP5aSTdJUmpfIulKSUuBmZImSFoq6WlJCyWNbPA9GCBpbtrHmvxZSEmX5torx30VcGg6Mzi3xvaGAJ8h+yP05+TaT5S0WNJdQF/RfiUNkbRIUm/K5YxGjsPMzMrLxZ6ZmTXbYOBW4AsR8SmyAm1GRMwDXgZ6IqInrTs7IiYCRwMnSDq6aKMR8TrwILBJ0t2SzpNU+T33U+D6iJgEvNJgP+dHxKR0Ju1DwGm5ZftFxAnAPOBnwLSImADcAvygYHuX5C7jPJmsKHsj9WkS8PVUlE4FDgMmA+OACZKOBy4HXoyIcRFxaY3tnwn8PiKeA16XND63bDLZezm2aL/AW8BZETEe6AF+VClwzcysPbnYMzOzZhsAbEhFCcBtwPEF654tqRdYCRwJjK234Yj4GnASsByYRVZ8QXbG6+40f0eD/eyRtExSHzAl7b/i1+nrJ4CjgEclrQK+AxxcsL3rUqE2LiIWAlOB89P3LQOGkRV5U9O0EugFjkjtu3IucE+avye9rlgeERvSfNF+BVwpaQ3wGDAKOKiB/ZqZWUl1tboDZma2x/lHIyuls02zgEkR8VdJt5KdFawrIvrILle8A9gATK8sqrH6Dt79H5+D074HAz8nu39wc7rnLr/vyjEIWBcRxzZyTFUEXJQKv52N2Vm/H0bEjVXt3YUbkoaRFaRHSQqygjokXVbV33r7nQ4cCEyIiHfS/ZO7fL/NzKy8fGbPzMyabTDQLenj6fWXgaVp/k2ye+8A9iUrUt6QdBBwSr2NpnvOTsw1jSN7YAvA4+y8j+283DqbgLHpnsGhZGcFK30E2JbuhZtWsNv1wIGSjk192EvSkQXrVlsIzEgPlkHS4ZI+nNovTPtF0ihJI3j3e1NtGnB7RBwSEd0RMZqs0D3ufex3KPBqKvR6gEMaPA4zMyspn9kzM7Nmewv4CnCfpC7gKeCGtOwm4GFJWyOiR9JKYB3wJ7KCrR4Bl0m6EfgXWaE4PS2bCdwlaSawoPIN6azdvcAa4HmySyeJiL9J+gXQB2xMfXyPiPi3pGnAvFQsdgE/SX3elV8C3UBvujfuNeDMiHhE0ieBJ9Itc9uBL0XEi5Iel7QWeLjqvr1zyR7gkrcA+CI7Lzmtu1/gTuB3klYAq4BnGzgGMzMrMf/pBTMz2+NI2h4RQ1rdDzMzs/7kyzjNzMzMzMw6kM/smZmZmZmZdSCf2TMzMzMzM+tALvbMzMzMzMw6kIs9MzMzMzOzDuRiz8zMzMzMrAO52DMzMzMzM+tA/wNaUCIIeIILEwAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 1080x720 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "def plot_scatter_chart(df,location):\n",
    "    bhk2 = df[(df.location==location) & (df.bhk==2)]\n",
    "    bhk3 = df[(df.location==location) & (df.bhk==3)]\n",
    "    matplotlib.rcParams['figure.figsize'] = (15,10)\n",
    "    plt.scatter(bhk2.total_sqft,bhk2.price,color='blue',label='2 BHK', s=50)\n",
    "    plt.scatter(bhk3.total_sqft,bhk3.price,marker='+', color='green',label='3 BHK', s=50)\n",
    "    plt.xlabel(\"Total Square Feet Area\")\n",
    "    plt.ylabel(\"Price (Lakh Indian Rupees)\")\n",
    "    plt.title(location)\n",
    "    plt.legend()\n",
    "    \n",
    "plot_scatter_chart(df7,\"Rajaji Nagar\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA3sAAAJcCAYAAABAE73ZAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdfZDld10n+vdnkpCQmdYkJJgwgU0kw5IHYcTm4a5ztSNCIIWiPDhx2b1kyRaswCqJUsCu94K67KZARQtxuYC7EyMuMwJKZEkEgjPsLAh3glEg0Zopk5ghkYSQQGfIw0zme/843ZnOTHfP6Zlz+nT/+vWq6jp9vr/z8OkpoOrN5/P7fqu1FgAAALpl1agLAAAAYPCEPQAAgA4S9gAAADpI2AMAAOggYQ8AAKCDhD0AAIAOEvYAoE9V9Y6q+qN5rt9aVT95hJ+9tar+7ZFXBwCPJewBsKLMFsiq6tKq2j6qmgBgGIQ9AACADhL2AGCGqnpSVX2squ6uqluq6hcPeskJVbW5qiar6itV9cyDrj+7qm6qqnur6r9X1QlTn3tyVX1y6nPvnfr9zMX5qwBYiYQ9AJhSVauS/HmSv0myNsnzk7ypqi6a8bKXJvmTJKck+eMkf1ZVx824/qokFyV5apKnJfnVqfVVSf57kn+W5ClJHkjye0P7YwBY8YQ9AFaiP6uq+6Z/kvz+1Pqzk5zWWvv11trDrbV/SPLBJJfMeO8NrbWPttb2JvntJCcked6M67/XWru9tfbtJO9M8vNJ0lq7p7X2sdba91prk1PXfny4fyYAK9mxoy4AAEbgZ1prn51+UlWXJvm36XXdnjQVAKcdk+R/zXh++/QvrbX9VbU7yZNmu57ktulrVXVikvckeVGSk6euj1XVMa21R476LwKAgwh7AHDA7Uluaa2tm+c1T57+ZWrs88wkd8x2Pb1xzelrv5zknyd5bmvtn6pqfZK/TlKDKBwADmaMEwAO+HKS71bVW6rq8VV1TFVdUFXPnvGaH6mql1XVsUnelOShJH814/obqurMqjolyX9IsnlqfSy9+/Tum7r29uH/OQCsZMIeAEyZGqf8qSTrk9yS5FtJPpTk+2e87BNJNia5N8m/TvKyqfv3pv1xkk8n+Yepn/80tf47SR4/9Zl/leS6of0hAJCkWmujrgEAAIAB09kDAADoIGEPAACgg4Q9AACADhL2AAAAOmhZn7N36qmntrPOOmvUZQAAAIzEDTfc8K3W2mmzXVvWYe+ss87Kjh07Rl0GAADASFTVbXNdM8YJAADQQcIeAABABwl7AAAAHbSs79mbzd69e7N79+48+OCDoy5lpE444YSceeaZOe6440ZdCgAAMAKdC3u7d+/O2NhYzjrrrFTVqMsZidZa7rnnnuzevTtnn332qMsBAABGoHNjnA8++GCe8IQnrNiglyRVlSc84QkrvrsJAAArWefCXpIVHfSm+TcAAICVrZNhDwAAYKUT9gbs9ttvz4UXXphzzz03559/fn73d3931te94x3vyNq1a7N+/fo8/elPzy/8wi9k//79SZJLL700H/3oRx/z+jVr1iRJbr311lxwwQWPrn/wgx/Ms571rNx7771D+osAAIDlaMWHvcnJ5EMfSt7ylt7j5OTRfd6xxx6b3/qt38rNN9+cv/qrv8r73ve+3HTTTbO+9vLLL8+NN96Ym266KV/96lezbdu2BX3X1Vdfnfe+97359Kc/nZNPPvnoCgcAADqlc7txLsT27cnFFyf79yd79iSrVydXXJF86lPJhg1H9plnnHFGzjjjjCTJ2NhYzj333HzjG9/IeeedN+d7Hn744Tz44IMLCmxbtmzJlVdemeuvvz6nnnrqkRULAAB01ort7E1O9oLe5GQv6CW9x+n1++8/+u+49dZb89d//dd57nOfO+v197znPVm/fn3OOOOMPO1pT8v69esfvfbmN78569evf/Rnpttuuy1vfOMb8+lPfzqnn3760RcKAAB0zooNe5s39zp6s9m/v3f9aNx///15+ctfnt/5nd/J933f9836mukxzrvuuit79uzJRz7ykUevvfvd786NN9746M9Mp512Wp7ylKdky5YtR1ckAADQWSs27O3ceaCjd7A9e5Jdu478s/fu3ZuXv/zledWrXpWXvexlh339cccdlxe96EX5/Oc/39fnn3jiibn22mvz/ve/Px/+8IePvFAAAKCzVuw9e+vW9e7Rmy3wrV6dnHPOkX1uay2XXXZZzj333FxxxRV9v+cLX/jCIeOa8znttNNy3XXXZWJiIqeeemouuuiiIysYAADopBXb2du4MVk1x1+/alXv+pH43//7f+fqq6/O5z73uUfvt/vUpz4162un79m74IILsm/fvrz+9a9f0HedffbZueaaa/Ka17wmX/rSl46sYAAAoJOqtTbqGo7Y+Ph427Fjx2PWbr755px77rl9vX+23ThXrTq63TiXkoX8WwAAAMtPVd3QWhuf7dqKHeNMeoHujjt6m7Hs2tUb3dy4MZk6vxwAAGDZWtFhL+kFu8suG3UVAAAAg7Vi79kDAADox8SmiUxsmhh1GQsm7AEAAHSQsAcAANBBK/6ePQAAgIPNHNvcdtu2Q9a2Xrp1cQs6Ajp7A/bggw/mOc95Tp75zGfm/PPPz9vf/vZZX/eOd7wja9euzfr16/P0pz89v/ALv5D9+/cnSS699NJ89KMffczr10xtEXrrrbfmggsueHT9gx/8YJ71rGfl3nvvHdJfBAAALEc6ezmQ0AeRzo8//vh87nOfy5o1a7J3795s2LAhL37xi/O85z3vkNdefvnl+ZVf+ZXs378/P/ZjP5Zt27blwgsv7Pu7rr766rz3ve/N5z73uZx88slHXTsAANAzMxsMMi8sJmFvwKrq0S7c3r17s3fv3lTVvO95+OGH8+CDDy4osG3ZsiVXXnllrr/++px66qlHVTMAANA9xjiH4JFHHsn69evzxCc+MS94wQvy3Oc+d9bXvec978n69etzxhln5GlPe1rWr1//6LU3v/nNWb9+/aM/M91222154xvfmE9/+tM5/fTTh/q3AAAAy9OK7ewN84bLY445JjfeeGPuu+++/OzP/my+9rWvPeY+u2nTY5x79+7NK17xinzkIx/JJZdckiR597vfnVe84hWPvna6W5gkp512Wk455ZRs2bIll19++RHXCQAAHN5yG9+cprM3RCeddFImJiZy3XXXzfu64447Li960Yvy+c9/vq/PPfHEE3Pttdfm/e9/fz784Q8PolQAAKBjVmxnb1g3XN5999057rjjctJJJ+WBBx7IZz/72bzlLW+Z9z2ttXzhC184ZFxzPqeddlquu+66TExM5NRTT81FF110tKUDAAAdorM3YHfeeWcuvPDCPOMZz8izn/3svOAFL8hLXvKSWV87fc/eBRdckH379uX1r3/9gr7r7LPPzjXXXJPXvOY1+dKXvjSI8gEAgI6o1tqoazhi4+PjbceOHY9Zu/nmm3Puuecu6HOW61aqh3Mk/xYAAMDyUVU3tNbGZ7u2Ysc4Z+payAMAADDGCQAA0EGdDHvLeTR1UPwbAADAyta5sHfCCSfknnvuWdFhp7WWe+65JyeccMKoSwEAAEakc/fsnXnmmdm9e3fuvvvuUZcyUieccELOPPPMUZcBAACMSOfC3nHHHZezzz571GUAAACMVOfGOAEAABD2AAAAOknYAwAA6CBhDwAAoIOEPQAAgA4S9gAAADpI2AMAAOggYQ8AAKCDhD0AAIAOEvYAAAA6SNgDAADoIGEPAACgg4Q9AACADhL2AAAAOkjYAwAA6CBhDwAAoIOEPQAAgA4S9gAAADpoaGGvqk6oqi9X1d9U1der6tem1k+pqs9U1c6px5NnvOdtVbWrqv6+qi4aVm0AAABdN8zO3kNJfqK19swk65O8qKqel+StSa5vra1Lcv3U81TVeUkuSXJ+khcl+f2qOmaI9QEAAHTW0MJe67l/6ulxUz8tyUuTXDW1flWSn5n6/aVJPtJae6i1dkuSXUmeM6z6AAAAumyo9+xV1TFVdWOSu5J8prX2pSQ/0Fq7M0mmHp849fK1SW6f8fbdU2sHf+Zrq2pHVe24++67h1k+AADAsjXUsNdae6S1tj7JmUmeU1UXzPPymu0jZvnMD7TWxltr46eddtqgSgUAAOiURdmNs7V2X5Kt6d2L982qOiNJph7vmnrZ7iRPnvG2M5PcsRj1AQAAdM0wd+M8rapOmvr98Ul+MsnfJbkmyaunXvbqJJ+Y+v2aJJdU1fFVdXaSdUm+PKz6AAAAuuzYIX72GUmumtpRc1WSLa21T1bVF5NsqarLkvxjklcmSWvt61W1JclNSfYleUNr7ZEh1gcAANBZ1doht8UtG+Pj423Hjh2jLgMAAGAkquqG1tr4bNcW5Z49AAAAFpewBwAA0EHCHgAAQAcJewAAAB0k7AEAAHSQsAcAANBBwh4AAEAHCXsAAAAdJOwBAAB0kLAHAADQQcIeAABABwl7AAAAHSTsAQAAdJCwBwAA0EHCHgAAQAcJewAAAB0k7AEAAHSQsAcAANBBwh4AAEAHCXsAAAAdJOwBAAB0kLAHAADQQcIeAABABwl7AAAAHSTsAQAAdJCwBwAA0EHCHgAAQAcJewAAAB0k7AEAAHSQsAcAANBBwh4AAEAHCXsAAAAdJOwBAAB0kLAHAADQQcIeAABABwl7AAAAHSTsAQAAdJCwBwAA0EHCHgAAQAcJewAAAB0k7AEAAHSQsAcAANBBwh4AAEAHCXsAAAAdJOwBAAB0kLAHAADQQcIeAABABwl7AAAAHSTsAQAAdJCwBwAA0EHCHgAAQAcJewAAAB0k7AEAAHSQsAcAANBBwh4AAEAHCXsAAAAdJOwBAAB0kLAHAADQQcIeAABABwl7AAAAHSTsAQAAdJCwBwAAdM7EpolMbJoYdRkjJewBAAB0kLAHAADQQceOugAAAIBBmDm2ue22bYesbb106+IWNGI6ewAAAB2kswcAAHTCzM7ddEdvpXXzZtLZAwAA6CBhDwAAoIOMcQIAAJ2zksc3p+nsAQAAdJCwBwAA0EHCHgAAQAcJewAAAB0k7AEAAHSQsAcAANBBwh4AAEAHCXsAAAAdJOwBAAB0kLAHAKxYE5smMrFpYtRlAAzF0MJeVT25qv6yqm6uqq9X1S9Nrb+jqr5RVTdO/Vw84z1vq6pdVfX3VXXRsGoDAADoumOH+Nn7kvxya+0rVTWW5Iaq+szUtfe01n5z5our6rwklyQ5P8mTkny2qp7WWntkiDUCAAB00tDCXmvtziR3Tv0+WVU3J1k7z1temuQjrbWHktxSVbuSPCfJF4dVIwCw8swc29x227ZD1rZeunVxCwIYkkW5Z6+qzkryw0m+NLX0xqr626r6b1V18tTa2iS3z3jb7swSDqvqtVW1o6p23H333UOsGgAAYPka5hhnkqSq1iT5WJI3tda+W1X/NclvJGlTj7+V5DVJapa3t0MWWvtAkg8kyfj4+CHXAQDmM7NzN93R080Dumionb2qOi69oPfh1trHk6S19s3W2iOttf1JPpjeqGbS6+Q9ecbbz0xyxzDrAwAA6Kph7sZZSf4gyc2ttd+esX7GjJf9bJKvTf1+TZJLqur4qjo7ybokXx5WfQAAAF02zDHOH03yr5N8tapunFr7D0l+vqrWpzeieWuS1yVJa+3rVbUlyU3p7eT5BjtxAgDDZHwT6LJh7sa5PbPfh/eped7zziTvHFZNAAAAK8Wi7MYJAADA4hL2AAAAOkjYAwAA6CBhDwAAoIOEPQAAgA4S9gAAADpI2AMAAOggYQ8AAGAeE5smMrFpYtRlLJiwBwAA0EHCHgAAQAcdO+oCAAAAlpqZY5vbbtt2yNrWS7cubkFHQGcPAACgg3T2AAAADjKzczfd0VsO3byZdPYAAAA6SNgDAADoIGOcAAAA81hu45vTdPYAAAA6SNgDAADoIGEPAACgg4Q9AACADhL2AAAAOkjYAwAA6CBhDwAAoIOEPQAAgA4S9gAAADpI2AMAAOggYQ8AAKCDhD0AAIAOEvYAAAA6SNgDAADoIGEPAACgg4Q9AACADhL2AAAAOkjYAwAA6CBhDwAAoIOEPQAAgA4S9gAAADpI2AMAAOggYQ8AAKCDhD0AAIAOEvYAAAA6SNgDAADoIGEPAACgg4Q9AACADhL2AAAAOkjYA4AOmtg0kYlNE6MuA4AREvYAAAA6SNgDAADooGNHXQAAMBgzxza33bbtkLWtl25d3IIAGCmdPQAAgA7S2QOAjpjZuZvu6OnmAaxcfXf2qmp1VR0zzGIAAAAYjDnDXlWtqqp/WVX/s6ruSvJ3Se6sqq9X1burat3ilQkAAMBCzDfG+ZdJPpvkbUm+1lrbnyRVdUqSC5NcWVV/2lr7o+GXCQAshPFNAOYLez/ZWtt78GJr7dtJPpbkY1V13NAqAwAA4IjNOcY5HfSq6qlVdfzU7xNV9YtVddLM1wAAALC09LNBy8eSPFJV5yT5gyRnJ/njoVYFAADAUekn7O1vre1L8rNJfqe1dnmSM4ZbFgAAAEejn7C3t6p+Psmrk3xyas29egAAAEtYP2Hv3yT5P5K8s7V2S1WdncQOnAAAAEvYfLtxJklaazdV1VuSPGXq+S1Jrhx2YQAAABy5w3b2quqnktyY5Lqp5+ur6pphFwYAAMCR62eM8x1JnpPkviRprd2Y3o6cAAAALFH9hL19rbXvHLTWhlEMAAAAg3HYe/aSfK2q/mWSY6pqXZJfTPKF4ZYFAADA0eins/fvk5yf5KH0DlP/TpI3DbMoAAAAjk4/u3F+L8l/rKr/3Frbswg1AQAAcJT62Y3zX1TVTUlunnr+zKr6/aFXBgAAwBHrZ4zzPUkuSnJPkrTW/ibJjw2zKAAAAI5OP2EvrbXbD1p6ZAi1AAAAMCD97MZ5e1X9iyStqh6X3m6cNw+3LAAAAI5GP529f5fkDUnWJvlGkvVTzwEAAFii+tmN81tJXrUItQAAADAg/ezG+YNV9edVdXdV3VVVn6iqH1yM4gAAADgy/Yxx/nGSLUnOSPKkJH+S5H8MsygAAACOTj9hr1prV7fW9k39/FGSNuzCAAAAOHL97Mb5l1X11iQfSS/kbUzyP6vqlCRprX17iPUBAABwBPoJexunHl930Ppr0gt/7t8DAABYYvrZjfPsxSgEAACAwTls2Kuq/2u29dbaHw6+HAAAAAahnw1anj3j5/9M8o4kP324N1XVk6vqL6vq5qr6elX90tT6KVX1maraOfV48oz3vK2qdlXV31fVRUf0FwEAANDXGOe/n/m8qr4/ydV9fPa+JL/cWvtKVY0luaGqPpPk0iTXt9aunNr45a1J3lJV5yW5JMn56R3x8Nmqelpr7ZEF/UUAACMysWkiSbL10q2L8j6A+fTT2TvY95KsO9yLWmt3tta+MvX7ZJKbk6xN8tIkV0297KokPzP1+0uTfKS19lBr7ZYku5I85wjqAwAAWPH6uWfvz3PgXL1VSc5L72D1vlXVWUl+OMmXkvxAa+3OpBcIq+qJUy9bm+SvZrxt99TawZ/12iSvTZKnPOUpCykDAABgxejn6IXfnPH7viS3tdZ29/sFVbUmyceSvKm19t2qmvOls6wdcnh7a+0DST6QJOPj4w53BwBGanoEM0m23bbtkLW5RjOP9H0A/Zp3jLOqjkny9dbattbatiT/X5KLq+rmfj68qo5LL+h9uLX28anlb1bVGVPXz0hy19T67iRPnvH2M5Pc0fdfAgAAwKPm7OxV1SVJ/t8ke6pqZ3q7cF6dXuB71eE+uHotvD9IcnNr7bdnXLomyauTXDn1+IkZ639cVb+d3gYt65J8eYF/DwDAoprZgVvIRitH+j6Afs03xvmrSX6ktbarqp6V5ItJLmmt/Wmfn/2jSf51kq9W1Y1Ta/8hvZC3paouS/KPSV6ZJK21r1fVliQ3pTcu+gY7cQIAAByZ+cLew621XUkydXzCLQsIemmtbc/s9+ElyfPneM87k7yz3+8AAJaPldC92v6P20ddAsCj5gt7T6yqK2Y8XzPz+UGjmQAAK96ax605ovd1OQADozNf2PtgkrF5ngMAALBEzRn2Wmu/tpiFAADdsxKOFzjpypMe/f07D33nkLX73nrfotcEkBzm6AUAAACWp34OVQcAOCIr4XiBmZ276Y6ebh6wFOjsAQAAdNBhO3tVdXySlyc5a+brW2u/PryyAAAAOBr9jHF+Isl3ktyQ5KHhlgMAdFXXxjdnY3wTWEr6CXtnttZeNPRKAAAAGJh+7tn7QlX90NArAYAFmtg08Zht/AGAA/rp7G1IcmlV3ZLeGGclaa21Zwy1MgAAAI5YP2HvxUOvAgAAgIE6bNhrrd2WJFX1xCQnDL0iAJjHzLHNbbdtO2RtJWwCAgD9OOw9e1X101W1M8ktSbYluTXJtUOuCwAAgKPQzxjnbyR5XpLPttZ+uKouTPLzwy0LAGY3s3M33dHTzQOAQ/WzG+fe1to9SVZV1arW2l8mWT/kugAAADgK/XT27quqNUk+n+TDVXVXkn3DLQsAAICj0U/Ye2mSB5NcnuRVSb4/ya8PsygA6IfxTQCYWz+7ce6Z8fSqIdYCAADAgMwZ9qpqe2ttQ1VNJmkzL6V3qPr3Db06AAAAjsicYa+1tmHqcWzxygEAAGAQ5uvsnTLfG1tr3x58OQAAAAzCfPfs3ZDe+GYleUqSe6d+PynJPyY5e+jVAQAAcETmPGevtXZ2a+0Hk/xFkp9qrZ3aWntCkpck+fhiFQgAAMDC9XOo+rNba5+aftJauzbJjw+vJAAAAI5WP+fsfauqfjXJH6U31vmvktwz1KoAgE6b2DSRxFmJAMPUT2fv55OcluRPk/xZkidOrQEAALBE9XOo+reT/NIi1AIAAMCAHDbsVdXTkvxKkrNmvr619hPDKwsA6Jrp0c0k2XbbtkPWjHQCDFY/9+z9SZL3J/lQkkeGWw4AAACD0E/Y29da+69DrwQA6LSZnTsbtAAMXz8btPx5Vb2+qs6oqlOmf4ZeGQAAAEesn87eq6ce3zxjrSX5wcGXAwAAwCD0sxvn2YtRCACwchjfBBi+OcNeVb1svje21j4++HIAYGlxbxkAy9V8nb2fmudaSyLsAQAALFFzhr3W2r9ZzEIAAAAYnH42aAGAFcXh3wB0QT9HLwAAALDM6OwBwEEc/g1AF/QV9qrqXyQ5a+brW2t/OKSaAAAAOEqHDXtVdXWSpya5MckjU8stibAHAACwRPXT2RtPcl5rrQ27GABYaoxvArBc9bNBy9eSnD7sQgAAABicOTt7VfXn6Y1rjiW5qaq+nOSh6euttZ8efnkAAAAcifnGOH9z0aoAAABgoOYMe621bUlSVS9urV0781pV/bsk24ZcGwAAAEeon3v2/u+q+onpJ1X1liQvHV5JAAAAHK1+duP86SSfrKo3J3lRkqdPrQEAALBEHTbstda+VVU/neSzSW5I8grHMAAAACxt8+3GOZnebpzTHpfkB5O8oqpaa+37hl0cAAAAR2a+DVrGFrMQAAAABqefe/ZSVScnWZfkhOm11trnh1UUAPRjYtNEkmTrpVtHWgcALEWHDXtV9W+T/FKSM5PcmOR5Sb6Y5Cfmex8AAACj08/RC7+U5NlJbmutXZjkh5PcPdSqAAAAOCr9jHE+2Fp7sKpSVce31v6uqv750CsDgFlMj24mybbbth2yZqQTAHr6CXu7q+qkJH+W5DNVdW+SO4ZbFgAAAEejFnJkXlX9eJLvT3Jta23v0Krq0/j4eNuxY8eoywBgRGzQAsBKV1U3tNbGZ7vW126c01pr26Y+8B+TPGUAtQEAADAE/WzQMpsaaBUAAAAM1II6ezP0P/sJAENifBMA5jZn2KuqK+a6lGTNcMoBAABgEObr7I3Nc+13B10IAAAAgzNn2Gut/dpiFgIAAMDgzLlBS1X9alWdPM/1n6iqlwynLAAAAI7GfGOcX03yyap6MMlXktyd5IQk65KsT/LZJP956BUCAACwYPONcX4iySeqal2SH01yRpLvJvmjJK9trT2wOCUCAACwUIc9eqG1tjPJzkWoBQAAgAE50kPVAQAAWMKEPQAAgA4S9gCGbGLTRCY2TYy6DABghTls2Kuqp1XV9VX1tannz6iqXx1+aQAAABypfjp7H0zytiR7k6S19rdJLhlmUQAAABydw+7GmeTE1tqXq2rm2r4h1QPQCTPHNrfdtu2Qta2Xbl3cggCAFaefzt63quqpSVqSVNUrktw51KoAAAA4Kv109t6Q5ANJnl5V30hyS5J/NdSqAJa5mZ276Y6ebh4AsJj6OVT9H5L8ZFWtTrKqtTY5/LIAAAA4Gv3sxvmfq+qk1tqe1tpkVZ1cVf9pMYoDAADgyPRzz96LW2v3TT9prd2b5OLhlQTQLVsv3WqEEwBYdP2EvWOq6vjpJ1X1+CTHz/N6AJiXg+YBYPj6CXt/lOT6qrqsql6T5DNJrjrcm6rqv1XVXdOHsU+tvaOqvlFVN079XDzj2tuqaldV/X1VXXQkfwwAAAA9/WzQ8q6q+mqS5yepJL/RWvuLPj57U5LfS/KHB62/p7X2mzMXquq89A5qPz/Jk5J8tqqe1lp7pI/vAQAA4CD9HL2Q1tq1Sa5dyAe31j5fVWf1+fKXJvlIa+2hJLdU1a4kz0nyxYV8JwBLl4PmAWBxzTnGWVXbpx4nq+q7M34mq+q7R/Gdb6yqv50a8zx5am1tkttnvGb31Npsdb22qnZU1Y677777KMoAAADorjk7e621DVOPYwP8vv+a5DeStKnH30rymvTGQw8pYY66PpDeIe8ZHx+f9TUALD0OmgeAxTXvBi1VtWrmBitHq7X2zdbaI621/Uk+mN6oZtLr5D15xkvPTHLHoL4XAABgpZk37E2Fsr+pqqcM4suq6owZT382yXSQvCbJJVV1fFWdnWRdki8P4jsBAABWon42aDkjyder6stJ9kwvttZ+er43VdX/SDKR5NSq2p3k7Ukmqmp9eiOatyZ53dRnfb2qtiS5Kcm+JG+wEydAdxnfBIDhq9bmv+2tqn58tvXW2rahVLQA4+PjbceOHaMuA5SUBVAAAB+gSURBVAAAYCSq6obW2vhs1+bs7FXVCUn+XZJzknw1yR+01vYNp0QAAAAGab579q5KMp5e0HtxejtnAgAAsAzMd8/eea21H0qSqvqD2DAFAABg2Zivs7d3+hfjmwAAAMvLfJ29Z1bVd6d+rySPn3peSVpr7fuGXh0AAABHZM6w11o7ZjELAQAAYHDmPVQdAACA5UnYAwAA6CBhD2AFmtg0kYlNE6MuAwAYImEPAACgg4Q9AACADprv6AUAOmTm2Oa227Ydsrb10q2LWxAAMFQ6ewAAAB2kswewDE135BbSjZv52iN5PwCwvOjsAQAAdJCwBwAA0EHGOAGWiUFusGJ8EwC6T2cPYMAcWA4ALAU6ewDLRJc2WDnpypOSJPe99b4RVwIA3aWzBwAA0EE6ewAD4MByAGCpEfYAlqHlGB6nRzeT5DsPfeeQNSOdADBYwh7AAHTpfjoAoBuEPQAWxczOnQ1aAGD4bNACAADQQTp7AANmfBMAWAqEPYAOWS73CxrfBIDhM8YJAADQQcIeAABABxnjBFjmHOgOAMxGZw8AAKCDdPYAlrmFHOi+XDZwAQCOns4eAABABwl7AAAAHWSME6BDZhvPtIELAKxMOnsAAAAdpLMHsEQMa/OUhWzgAgB0h84eAABAB+nsAUMzOZls3pzs3JmsW5ds3JiMjY26qm476cqTkiT3vfW+EVcCAIyasAcMxfbtycUXJ/v3J3v2JKtXJ1dckXzqU8mGDaOubulY7M1TjG8CwMphjBMYuMnJXtCbnOwFvaT3OL1+//2jrQ8AYCXQ2QMGbvPmXkdvNvv3965fdtni1rRUDWLzlOnRzST5zkPfOWTNSCcArEw6e8DA7dx5oKN3sD17kl27FreexTaxaeIxo5gAAKOgswcM3Lp1vXv0Zgt8q1cn55yz+DUtphv/6cZF/b6ZnTsbtAAA03T2gIHbuDFZNcf/uqxa1bvOobZeutUGKgDAwOjsAQM3NtbbdfPg3ThXreqtr1kz6goHb7qjtv709Y/eNzfMXTWXEwe5A8BoCHvAUGzYkNxxR28zll27eqObGzd2M+glyf0P97YYnTnCudjjnInxTQDgAGEPGJo1a1bOrptrHtdLsetPX//oeXnrT18/ypIAgBVO2AM4QjPHNKdHN5PkmDomycoeW1zsw+IBgEMJewAD9kh7ZNQlAAAIewBHaq4D0evXajQFLSGDOCweADg6jl4AAADoIJ096KDJyd4umDt39g4437ixdxzCUrHU6zsS0/elzezqzfy9vb0tek0AwMom7EHHbN9+6Pl2V1zRO99uw4ZRV7f062PwjG8CwGhUa8v3/20eHx9vO3bsGHUZsGRMTiZr1/YeDzY21jv3bpTn3C31+gZluqOnmwcADFtV3dBaG5/tmnv2oEM2b+51zGazf3/v+igt9foAALpE2IMO2bmzNxo5mz17kl27Freegy31+gAAusQ9e9Ah69b17oGbLVCtXp2cc87i1zTTUq9vUH78n/34qEsAANDZgy7ZuDFZNcd/q1et6l0fpaVeHwBAl+jsQYeMjfV2tTx4t8tVq3rro978ZKnXdzSmDw5PDhzDMHPNjpQAwGIT9qBjNmzo7Wq5eXPvHrhzzul1zJZKkFrq9QEAdIWjF2ABungYOIM33dHTzQMAhm2+oxd09qBPDgMHAGA5sUEL9GFyshf0JicP7CS5Z8+B9fvvH219AABwMJ096EM/h4Ffdtni1sTSZXwTAFgKhD3ow0o5DNw9iQAA3SHsQR9WwmHg7kkEAOgWu3FCHyYnk7Vre48HGxvrHSWwnI8OGNbfp1MIADBc8+3GaYMW6MP0YeBjY72OV9J7nF5fzkEv6e+exIXavr0XIN/0puRd7+o9rl3bWwcAYPiMcUKfunwY+KDvSZy5e+nMz0l668u9EwoAsBwIe7AAa9Z0c9fNQd+TaPdSAIDRM8YJZOPGZNUc/2uwalXv+kKslN1LAQCWMmEPVrjpTVR+6qeS449PTjyxt3409yROdwpn05XdSwEAljpjnLCCzXbcwiOPJK96VXLhhUd+T+LGjb1jG2ZzJJ1CAAAWTmcPVqiZm6hMj1zu2ZM8+GByzTVHt/lM13cvBQBYDnT2YIUa9iYqXd69FABgORD2YIVajE1Uurp7KQDAcmCME1Yom6gAAHSbsAcr1KCPWwAAYGkxxgkdNn2sws6dvU7exo29TVKSA5ulvPjFyd69yUMP9Y5eOO44m6gAAHTB0Dp7VfXfququqvrajLVTquozVbVz6vHkGdfeVlW7qurvq+qiYdUFK8X27cnatcmb3pS86129x7Vre+sHa+2xj4Pyla8kT31qbyz0qU/tPQcAYHEMc4xzU5IXHbT21iTXt9bWJbl+6nmq6rwklyQ5f+o9v19VxwyxNui0uY5VmF6///7H/v7ww73XPPxw7/n0+tHYuDH5kR9J/uEfku99r/f4Iz9iPBQAYLEMLey11j6f5NsHLb80yVVTv1+V5GdmrH+ktfZQa+2WJLuSPGdYtUHX9XOswnyvefjh5JWvTD70oV4oXKivfCXZsmX2a1u2JH/7twv/TAAAFmaxN2j5gdbanUky9fjEqfW1SW6f8brdU2uHqKrXVtWOqtpx9913D7VYWK76OVZhvtc89FBy3XXJa1+bnH767KOf83nlK+e//rKXLezzAABYuKWyG2fNsjbr3UOttQ+01sZba+OnnXbakMuC5amfYxXme8201nojmBdeuLCxzn/6p6O7DgDA0VvssPfNqjojSaYe75pa353kyTNed2aSOxa5NuiMjRvnH+PcuHH+oxcOtm9f8t739v/9p59+dNcBADh6ix32rkny6qnfX53kEzPWL6mq46vq7CTrknx5kWuDTqnZ+uUz1qePXhgbS0488fCf93u/1/93/8mfzH/94x/v/7NGZWLTRCY2TYy6DACAIzbMoxf+R5IvJvnnVbW7qi5LcmWSF1TVziQvmHqe1trXk2xJclOS65K8obX2yLBqg67bvHn+sLd5c+/3DRuSO+5IXvGKw3/mQsY4n/Ws5Od+bvZrP/dzyTOe0f9nAQBwZIZ2qHpr7efnuPT8OV7/ziTvHFY90BXH/nrvv7b7/p99c76mnw1apq1Z099Y5ROecOjafIe2b96c/Mf/2NuM5Z/+qfcdH/+4oNcV013PrZduHWkdAMDchhb2gNGZ3nxltsA3vUHLwa8/5pjkkXn66S984WOfb9/eO49v//7e96xenVxxRW80dMOG3mue8YzHBsulbubY5rbbth2yJtgAAMvJUtmNExig+TZfWbXq0IPNN25Mjp3n//o59tjk2c8+8LyfQ9sBABgtnT1YBqZHN5PkkanbWWeuHTzSOb35ysGdt1Wreutr1uSQ119zTXLRRbN//+Mf/9iA2M+h7Zdd1v/ft1TM7NwZUzyUzicALC/CHnTU9OYrmzf3RinPOacX2A4OetNe+MLkL/4ieelLe+Oce/fOHRAXck8gAACjIezBMrDv/9n36GYor919bKqSey/f9+hmKHNZs+bQDtt8m6q88IXJ3XcfPiAu9J5AukHnEwCWl2qtjbqGIzY+Pt527Ngx6jJg6B6zGcov9/4/mrH37HvMZigL/pyDRjsX8jmTk8natb3Hg42N9TqKc3UQ6QZhDwCWhqq6obU2Pts1G7TAEjfbZigz1/vdDGWQm6rMPJB99ere2urVB9YFPQCA0TPGCUvcIZuh/PqBzVgWshnK5s29+/Bms3fvwjdVWeg9gXSLjh4ALH3CHixxh2yG8taTeo9X3regzVC+9rXkwQdnv/bgg8lNNy28ttnuCewCI4oAQBcY44QlbnozlNksZDOUe++d//o99yysLgAAljZhD5a4hR6QPpdTTpn/+hOesLC6AABY2oxxwhI3Npbs+5WTkgemFk74Tu/xrSdl3+OTM38vue+t9x32c84/PznhhNlHOU84ITnvvMHVvBw5MBwA6BqdPVjiJieTBx6Y/doDDyT9np6ycWNy3HGzXzvuuP47hAAALA86e7DEbd6crH7vfQc2aZmxQcvq1clv/25/nzN9LMJc5+yt9F00HRgOAHSNsAdL3CG7cc6wkN04E8clAACsJMIeLHHTu3HOFvgWshvntK4elwAAwGNV6/eGnyVofHy87dixY9RlwFBNTiZr1/YeDzY21uvU6cwBAKxMVXVDa218tms2aIElbvpeu7GxA+ftrV59YH05Bb2JTROP2eESAIDhMcYJy4B77QAAWChhD5aJUdxrV79WSZL29uU77g0AsFIJe8BQOawcAGA03LMHAADQQTp7QCYne/cD7tyZvOvEOuT69DhnsvCRToeVAwCMhrAHK9z27cnFFyf790+d5ff2qQuHZj4AAJYRYQ9WsMnJXtB7zBl+v9br3I2NJZO/bIMWAIDlStiDFWzz5l5HbzZzrR8N45sAAIvHBi2wgu3cOTW6OYs9e5Jo6AEALFvCHqxg69Ylq1fPfm316uRDT25GOAEAlilhD1awjRuTVXP8r8CqVb3rAAAsT8IerGBjY8mnPtV7nO7wrV59YH3NmtHWBwDAkbNBCywz02feDWq8csOG5I47epu17NqVnHNOr6Mn6AEALG/CHpA1a5LLLht1FQAADJIxTgAAgA7S2YNlYHp0c641O2YCAHAwnT0AAIAO0tmDZWBm527QG7SsJBObJpIkWy/dOtI6AAAWg84eAABABwl7AAAAHWSME5aZg8c3jSbOb/rfJ0m23bbtkDX/bgBAV+nsAQAAdJDOHtBpMzt3uqAAwEoi7MEyZDQRAIDDMcYJAADQQdXa8j2ra3x8vO3YsWPUZcBIGU0EAFi5quqG1tr4bNd09gAAADrIPXswIJOTyebNyc6dybp1ycaNydjYqKsCAGClMsYJA7B9e3Lxxcn+/cmePcnq1cmqVcmnPpVs2DDq6gAA6CpjnDBEk5O9oDc52Qt6Se9xev3++0dbHwAAK5OwB0dp8+ZeR282+/f3rgMAwGIT9uAo7dx5oKN3sD17kl27FrceAABIhD04auvW9e7Rm83q1ck55yxuPQAAkAh7cNQ2buxtxjKbVat61wEAYLEJe6wYE5smHj2AfJDGxnq7bo6NHejwrV59YH3NmoF/5VEZ1r8DAABLi3P2YAA2bEjuuKO3GcuuXb3RzY0bl17QAwBg5RD2YEDWrEkuu2zUVQAAQI+wR6fNHFfcdtu2Q9a2Xrp1cQsaEf8OAAArj3v2AAAAOqhaa6Ou4YiNj4+3HTt2jLoMlonpTtZK72L5dwAA6I6quqG1Nj7bNWOcLBmTk70NTnbu7J1dt3Fjb0fLrnwfAAAsJmGPJWH79uTii5P9+5M9e3pHF1xxRe/ogg0bhvN9L35xsndv8tBDyfHHJ5dfnlx77XC+DwAAFpsxTkZucjJZu7b3eLCxsd6RBoM8wmByMjn99OR73zv02oknJt/8piMTAABYHuYb47RBCyO3eXOvozeb/ft71wfpqqtmD3pJb/11r5s9eA7K5GTyoQ8lb3lL73GY3wUAwMol7DFyO3f2Rjdns2dP75DyQfrkJ+e/vnlzr9O4fftgvzfpfebatcmb3pS86129x2F9FwAAK5t79hi5det69+jNFvhWr07OOWdx63nkkWTy5ROZ2JTct35r3yOdh9vwZXKyd1/izE7e9N988cWDH1cFAGBl09lj5DZuTFbN8Z/EVat61wfpJS/p73Wt9T9C2k/HbrHHVQEAWNmEPUZubKy36+bYWK+Tl/Qep9cH3e169auTxz/+8K/bv7+/EdKZHbvpTt2ePQfW77+/t7bY46oAAKxsxjhZEjZs6I0xbt7cCz3nnNPr6A1jrHFsLPn0p3tHLzzwQG9sM0ly6cSBF521LUny8cdN5IubektzHULeT8fussuW3rgqAADdJuyxZKxZ0wtFi2HDhuTOO3s7c15xRfLww7O/7olPnP9zJieTj360v47dxo2975rNMMZVAQBY2YQ9Vqw1a5I3vCF55jOnDnT/k62PHuj+4CUT+aEfSv7XZVvnfP/0QfBzBcXksR276bHUgw+PX7VqOOOqAACsbMIeK95sI6SbkhxzzNzvmW1nzdkc3LFbzHFVAABWNmGvgw53BEAXHe3ffPAI6dWb5n/9fPfpJcnjHpccf/zsHbvFHFcFAGDlEvY6Znq0cOaY4BVX9ELHhg2jrm44hvE3z7UZy7T5dtZMkuc/P9myRccOAIDREfY6ZCUe2n0kf/NsXcBkYZ3Bw+2s+fKXd+/fGgCA5UXY65B+jwDokoX+zbN1AX/xF5Oq3k+/nUE7awIAsNQ5VL1DVuKh3Qv5m+c6/PyBB5LvfW/+A9EPttgHwQMAwELp7HXISjy0eyF/8+E2VTnY4bqhdtYEAGApE/Y6ZCWOFi7kbz7cpioH66cbamdNAACWKmOcHbISRwsX8jfv3r2wz+5qNxQAgJWhWmujruGIjY+Ptx07doy6jCXn/vsPjBaeeWbSWnL77d0+c2/m3zzbOOUddyRr1y7sM8fGurmDKQAA3VFVN7TWxme9Jux112w7T65a1e0z9+by6lcnf/iHc18/5pjkhBP8OwEAsLzMF/ZGcs9eVd2aZDLJI0n2tdbGq+qUJJuTnJXk1iQ/11q7dxT1dcFKPHNvPn/3d/Nff9azkte9zkYrAAB0xyg3aLmwtfatGc/fmuT61tqVVfXWqedvGU1py99in7k320HlS2lc9OlPT7785bmvn3eejVYAAOiWpbRBy0uTXDX1+1VJfmaEtSx7i3nm3vbtvfvh3vSm5F3v6j2uXdtbXyr+y3+Z//qVVy5OHQAAsFhGFfZakk9X1Q1V9dqptR9ord2ZJFOPT5ztjVX12qraUVU77r777kUqd/mZPn9uNoPcZXKug8oPdyj5YnvSk5L3vW/2a+97X3L66YtbDwAADNuowt6PttaeleTFSd5QVT/W7xtbax9orY231sZPO+204VW4zG3c2NtkZDaDPHOvn3HRpeL1r0/uvLO3Wcvzntd7vPPO3joAAHTNSO7Za63dMfV4V1X9aZLnJPlmVZ3RWruzqs5IctcoauuK6XPm5tqNc1CbjyzmuOggnH56smnTqKsAAIDhW/SwV1Wrk6xqrU1O/f7CJL+e5Jokr05y5dTjJxa7tq7ZsKG36+Z8588drelx0dkCn0PJAQBgdBb9nL2q+sEkfzr19Ngkf9xae2dVPSHJliRPSfKPSV7ZWvv2fJ/lnL3Rm5zsbcYy84iHaQ4lBwCA4VpS5+y11v4hyTNnWb8nyfMXux6OzmKNiwIAAAszynP26IjFGBcFAAAWRthjINascSg5AAAsJUvpUHUAAAAGRGdvgCYne6OMO3f2dqncuLF3TxsAAMBiE/YGZPv2QzcpueKK3iYlGzaMujoAAGClMcY5AJOTvaA3OXngvLk9ew6s33//aOsDAABWHmFvADZv7nX0ZrN/f+86AADAYhL2BmDnzgMdvYPt2dM7jgAAAGAxCXsDsG5d7x692axe3Tt3DgAAYDEJewOwcWOyao5/yVWretcBAAAWk7A3AGNjvV03x8YOdPhWrz6wvmbNaOsDAABWHkcvDMiGDckdd/Q2Y9m1qze6uXGjoAcAAIyGsDdAa9Ykl1026ioAAACMcQIAAHSSsAcAANBBwh4AAEAHCXsAAAAdJOwBAAB0kLAHAADQQcIeAABABwl7AAAAHSTsAQAAdJCwBwAA0EHCHgAAQAcJewAAAB0k7AEAAHSQsAcAANBBwh4AAEAHCXsAAAAdJOzx/7d357FylWUcx7+/tJWiZVFEgoVQorgAmkqhgWigFYNiTIAEF0QFtxjc0AQNLtEao+KCCxpRVMISBGswUUkQkQAmhlCxFEpFVAQERBFxoUZU9PGP894wXmemV1LuvT3z/SQn98xz3jnnPTNP39vnvufMSJIkSeohiz1JkiRJ6iGLPUmSJEnqoVTVXPfhEUvye+COue6HHnVPBO6b605oXjI3NIx5oWHMC41ibmiYbSkv9qqqXYdt2KaLPU2GJNdV1YFz3Q/NP+aGhjEvNIx5oVHMDQ3Tl7zwMk5JkiRJ6iGLPUmSJEnqIYs9bQvOmusOaN4yNzSMeaFhzAuNYm5omF7khffsSZIkSVIPObMnSZIkST1ksSdJkiRJPWSxpzmR5Owk9ya5aSD2hCSXJ/lF+/n4gW3vSfLLJLckeeFAfEWSjW3bGUky2+eirWdEXqxJcneSDW158cA282ICJNkzyZVJbk6yKcnJLe6YMcHG5IVjxoRLsjjJuiQ3tNz4UIs7ZkywMXnR7zGjqlxcZn0BDgUOAG4aiH0COLWtnwp8vK3vC9wAbAfsDdwKLGjb1gGHAAEuBY6c63Nz2ep5sQY4ZUhb82JCFmB34IC2vgPw8/b+O2ZM8DImLxwzJnxp7+OStr4IuBY42DFjspcxedHrMcOZPc2JqvohcP+08FHAuW39XODogfhFVfX3qroN+CWwMsnuwI5VdU11//LOG3iOtkEj8mIU82JCVNU9VbW+rT8A3AwsxTFjoo3Ji1HMiwlRnc3t4aK2FI4ZE21MXozSi7yw2NN8sltV3QPdL3HgSS2+FLhzoN1dLba0rU+Pq3/emuTGdpnn1GU35sUESrIMeA7dX2QdMwT8T16AY8bES7IgyQbgXuDyqnLM0Ki8gB6PGRZ72hYMuw66xsTVL2cCTwGWA/cAp7e4eTFhkiwBLgbeUVV/Gdd0SMzc6KkheeGYIarqX1W1HNiDbjZm/zHNzY0JMSIvej1mWOxpPvldmxqn/by3xe8C9hxotwfwmxbfY0hcPVJVv2uD87+BrwAr2ybzYoIkWUT3H/oLqupbLeyYMeGG5YVjhgZV1Z+Aq4AX4ZihZjAv+j5mWOxpPvkOcEJbPwH49kD8FUm2S7I3sA+wrl2C8UCSg9unIL1m4DnqialfzM0xwNQndZoXE6K9j18Dbq6qTw9scsyYYKPywjFDSXZNsnNb3x54AfAzHDMm2qi86PuYsXCuO6DJlORCYBXwxCR3AR8ETgPWJnk98GvgpQBVtSnJWuCnwEPAW6rqX21XJwHnANvTfRrSpbN4GtrKRuTFqiTL6S6RuB14E5gXE+a5wKuBje1eC4D34pgx6UblxXGOGRNvd+DcJAvoJjbWVtUlSa7BMWOSjcqL8/s8ZqT7EBlJkiRJUp94GackSZIk9ZDFniRJkiT1kMWeJEmSJPWQxZ4kSZIk9ZDFniRJkiT1kMWeJOlRlWSXJBva8tskdw88fsy0tu9I8tgZ7POqJAcOib8kyfVJbkjy0yRv2prn8kglWTPtvE97BPvYOcmbt9DmmCSV5BmPvLeSpL7wqxckSbMmyRpgc1V9asT224EDq+q+LeznKuCUqrpuILYIuANYWVV3JdkOWFZVt2yl7g/rx8KqemgG7dYw5rxneKxlwCVVtf+YNmvpvkvqiqpaM2T7goHviZIk9Zwze5KkWZfk8DYDtzHJ2Um2S/J24MnAlUmubO3OTHJdkk1JPrSF3e4ALAT+AFBVf58q9JLsneSaJD9O8uEkm1t8VZJLBvr1hSQntvUPtPY3JTkrSVr8qiQfTXI1cHKSFUmuTvKTJJcl2X2Gr8GCJJ9sx7hxcBYyybsG4lPnfRrwlDYz+Mkh+1tC90XjrwdeMRBfleTKJF+n+wLyocdNsiTJFUnWt/flqJmchyRp/rLYkyTNtsXAOcDLq+pZdAXaSVV1BvAbYHVVrW5t31dVBwLPBg5L8uxRO62q+4HvAHckuTDJ8Ummfs99Djizqg4CfjvDfn6hqg5qM2nbAy8Z2LZzVR0GnAF8Hji2qlYAZwMfGbG/dw5cxvlCuqLsz61PBwFvbEXpEcA+wEpgObAiyaHAqcCtVbW8qt41ZP9HA9+rqp8D9yc5YGDbSrrXct9RxwUeBI6pqgOA1cDpUwWuJGnbZLEnSZptC4DbWlECcC5w6Ii2L0uyHrge2A/Yd9yOq+oNwOHAOuAUuuILuhmvC9v6+TPs5+ok1ybZCDy/HX/KN9rPpwP7A5cn2QC8H9hjxP4+0wq15VV1GXAE8Jr2vGuBXeiKvCPacj2wHnhGi2/JccBFbf2i9njKuqq6ra2POm6Ajya5EfgBsBTYbQbHlSTNUwvnugOSpInz15k0arNNpwAHVdUfk5xDNys4VlVtpLtc8XzgNuDEqU1Dmj/Ef//hc3E79mLgi3T3D97Z7rkbPPbUOQTYVFWHzOScpgnwtlb4PRzsZv0+VlVfnhZfNnJHyS50Ben+SYquoK4k757W33HHPRHYFVhRVf9s909u8fWWJM1fzuxJkmbbYmBZkqe2x68Grm7rD9DdewewI12R8uckuwFHjttpu+ds1UBoOd0HtgD8iIfvYzt+oM0dwL7tnsGd6GYFp/oIcF+7F+7YEYe9Bdg1ySGtD4uS7Dei7XSXASe1D5YhydOSPK7FX9eOS5KlSZ7Ef7820x0LnFdVe1XVsqrak67Qfd7/cdydgHtbobca2GuG5yFJmqec2ZMkzbYHgdcC30yyEPgx8KW27Szg0iT3VNXqJNcDm4Bf0RVs4wR4d5IvA3+jKxRPbNtOBr6e5GTg4qkntFm7tcCNwC/oLp2kqv6U5CvARuD21sf/UVX/SHIscEYrFhcCn2193pKvAsuA9e3euN8DR1fV95M8E7im3TK3GXhVVd2a5EdJbgIunXbf3nF0H+Ay6GLglTx8yenY4wIXAN9Nch2wAfjZDM5BkjSP+dULkqSJk2RzVS2Z635IkvRo8jJOSZIkSeohZ/YkSZIkqYec2ZMkSZKkHrLYkyRJkqQestiTJEmSpB6y2JMkSZKkHrLYkyRJkqQe+g99gK1VnVHCgwAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 1080x720 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plot_scatter_chart(df7,\"Hebbal\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {},
   "outputs": [],
   "source": [
    "def remove_bhk_outliers(df):\n",
    "    exclude_indices = np.array([])\n",
    "    for location,location_df in df.groupby('location'):\n",
    "        bhk_stats = {}\n",
    "        for bhk,bhk_df in location_df.groupby('bhk'):\n",
    "            bhk_stats[bhk] = {\n",
    "                'mean':np.mean(bhk_df.price_per_sqft),\n",
    "                'std':np.std(bhk_df.price_per_sqft),\n",
    "                'count':bhk_df.shape[0]\n",
    "            }\n",
    "        for bhk,bhk_df in location_df.groupby(\"bhk\"):\n",
    "            stats = bhk_stats.get(bhk-1)\n",
    "            if stats and stats['count']>5:\n",
    "                exclude_indices = np.append(exclude_indices,bhk_df[bhk_df.price_per_sqft < (stats['mean'])].index.values)\n",
    "    return df.drop(exclude_indices,axis='index')       "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "metadata": {},
   "outputs": [],
   "source": [
    "df8 = remove_bhk_outliers(df7)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(7329, 7)"
      ]
     },
     "execution_count": 114,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df8.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Outlier Removal Using Bathrooms Feature"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 115,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 4.,  3.,  2.,  5.,  8.,  1.,  6.,  7.,  9., 12., 16., 13.])"
      ]
     },
     "execution_count": 115,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df8.bath.unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Text(0, 0.5, 'Count')"
      ]
     },
     "execution_count": 116,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA4EAAAJQCAYAAAAwv2HyAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3dfbRldX3f8c9Xxgc0EiGMlDCkgykmQeJDHCiJiYliIo0uoVkhIcvotKWhJcRoYkygrpU0XSXFmgdrG7HUWKAxsiZGC/EpUoLYdKE4+ISABKoGJ1CZJI2SZBUFv/3jbMLxzp3hjpkz5878Xq+17jrn/O7e537vbJiZ9+xz9q3uDgAAAGN4xLIHAAAAYP8RgQAAAAMRgQAAAAMRgQAAAAMRgQAAAAMRgQAAAANZaARW1Wer6qaq+lhVbZ/Wjqiqq6vq9un28LntL6iqO6rqtqp6/tz6M6fnuaOqXl9Vtci5AQAADlb740zgc7r76d29ZXp8fpJruvv4JNdMj1NVJyQ5K8lTkpyW5A1Vdci0z8VJzkly/PRx2n6YGwAA4KCzjJeDnp7ksun+ZUnOmFu/orvv6+7PJLkjyclVdXSSw7r7+p79ZPvL5/YBAABgL2xY8PN3kvdVVSf5z919SZKjuvvuJOnuu6vqidO2xyT54Ny+O6a1L0/3V67v0ZFHHtmbN2/+u38HAAAAB6Abb7zxz7p748r1RUfgs7r7rin0rq6qT+1h29Xe59d7WN/1CarOyexlo/mmb/qmbN++fW/nBQAAOChU1Z+str7Ql4N2913T7T1J3pHk5CSfn17imen2nmnzHUmOndt9U5K7pvVNq6yv9vUu6e4t3b1l48ZdghcAAGB4C4vAqnpcVT3+wftJfiDJJ5NclWTrtNnWJFdO969KclZVPbqqjsvsAjA3TC8dvbeqTpmuCvrSuX0AAADYC4t8OehRSd4x/TSHDUl+p7vfW1UfTrKtqs5OcmeSM5Oku2+uqm1Jbklyf5LzuvuB6bnOTXJpkkOTvGf6AAAAYC/V7IKbB58tW7a09wQCAACjqqob535U399axo+IAAAAYElEIAAAwEBEIAAAwEBEIAAAwEBEIAAAwEBEIAAAwEBEIAAAwEBEIAAAwEBEIAAAwEBEIAAAwEBEIAAAwEBEIAAAwEBEIAAAwEBEIAAAwEBEIAAAwEBEIAAAwEBEIAAAwEBEIAAAwEBEIAAAwEBEIAAAwEA2LHuA0Ww+/13LHmEpPnvRC5Y9AgAAEGcCAQAAhiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABrLwCKyqQ6rqo1X1zunxEVV1dVXdPt0ePrftBVV1R1XdVlXPn1t/ZlXdNH3u9VVVi54bAADgYLQ/zgS+PMmtc4/PT3JNdx+f5JrpcarqhCRnJXlKktOSvKGqDpn2uTjJOUmOnz5O2w9zAwAAHHQWGoFVtSnJC5K8aW759CSXTfcvS3LG3PoV3X1fd38myR1JTq6qo5Mc1t3Xd3cnuXxuHwAAAPbCos8Evi7Jzyf5ytzaUd19d5JMt0+c1o9J8rm57XZMa8dM91eu76Kqzqmq7VW1fefOnfvmOwAAADiILCwCq+qFSe7p7hvXussqa72H9V0Xuy/p7i3dvWXjxo1r/LIAAADj2LDA535WkhdV1Q8meUySw6rqt5N8vqqO7u67p5d63jNtvyPJsXP7b0py17S+aZV1AAAA9tLCzgR29wXdvam7N2d2wZc/7O4fT3JVkq3TZluTXDndvyrJWVX16Ko6LrMLwNwwvWT03qo6Zboq6Evn9gEAAGAvLPJM4O5clGRbVZ2d5M4kZyZJd99cVduS3JLk/iTndfcD0z7nJrk0yaFJ3jN9AAAAsJf2SwR29/uTvH+6/+dJTt3NdhcmuXCV9e1JTlzchAAAAGPYHz8nEAAAgHVCBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxkYRFYVY+pqhuq6uNVdXNV/fK0fkRVXV1Vt0+3h8/tc0FV3VFVt1XV8+fWn1lVN02fe31V1aLmBgAAOJgt8kzgfUme291PS/L0JKdV1SlJzk9yTXcfn+Sa6XGq6oQkZyV5SpLTkryhqg6ZnuviJOckOX76OG2BcwMAABy0FhaBPfNX08NHTh+d5PQkl03rlyU5Y7p/epIruvu+7v5MkjuSnFxVRyc5rLuv7+5OcvncPgAAAOyFhb4nsKoOqaqPJbknydXd/aEkR3X33Uky3T5x2vyYJJ+b233HtHbMdH/lOgAAAHtpoRHY3Q9099OTbMrsrN6Je9h8tff59R7Wd32CqnOqantVbd+5c+feDwwAAHCQ2y9XB+3uv0zy/szey/f56SWemW7vmTbbkeTYud02JblrWt+0yvpqX+eS7t7S3Vs2bty4T78HAACAg8Eirw66saqeMN0/NMnzknwqyVVJtk6bbU1y5XT/qiRnVdWjq+q4zC4Ac8P0ktF7q+qU6aqgL53bBwAAgL2wYYHPfXSSy6YrfD4iybbufmdVXZ9kW1WdneTOJGcmSXffXFXbktyS5P4k53X3A9NznZvk0iSHJnnP9AEAAMBeWlgEdvcnkjxjlfU/T3Lqbva5MMmFq6xvT7Kn9xMCAACwBvvlPYEAAACsDyIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgICIQAABgIGuKwKp61lrWAAAAWN/WeibwP65xDQAAgHVsw54+WVXfmeS7kmysqp+d+9RhSQ5Z5GAAAADse3uMwCSPSvJ103aPn1v/YpIfXtRQAAAALMYeI7C7r0tyXVVd2t1/sp9mAgAAYEEe7kzggx5dVZck2Ty/T3c/dxFDAQAAsBhrjcDfTfLGJG9K8sDixgEAAGCR1hqB93f3xQudBAAAgIVb64+I+P2q+smqOrqqjnjwY6GTAQAAsM+t9Uzg1un2VXNrneRJ+3YcAAAAFmlNEdjdxy16EAAAABZvTRFYVS9dbb27L9+34wAAALBIa3056Elz9x+T5NQkH0kiAgEAAA4ga3056MvmH1fV1yf5bwuZCAAAgIVZ69VBV/qbJMfvy0EAAABYvLW+J/D3M7saaJIckuTbkmxb1FAAAAAsxlrfE/irc/fvT/In3b1jAfMAAACwQGt6OWh3X5fkU0ken+TwJF9a5FAAAAAsxpoisKp+JMkNSc5M8iNJPlRVP7zIwQAAANj31vpy0FcnOam770mSqtqY5H8keduiBgMAAGDfW+vVQR/xYABO/nwv9gUAAGCdWOuZwPdW1R8keev0+EeTvHsxIwEAALAoe4zAqvoHSY7q7ldV1Q8l+e4kleT6JG/ZD/MBAACwDz3cSzpfl+TeJOnut3f3z3b3z2R2FvB1ix4OAACAfevhInBzd39i5WJ3b0+yeSETAQAAsDAPF4GP2cPnDt2XgwAAALB4DxeBH66qn1i5WFVnJ7lxMSMBAACwKA93ddBXJHlHVb04D0XfliSPSvKPFzkYAAAA+94eI7C7P5/ku6rqOUlOnJbf1d1/uPDJAAAA2OfW9HMCu/vaJNcueBYAAAAW7OHeEwgAAMBBRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMRAQCAAAMZGERWFXHVtW1VXVrVd1cVS+f1o+oqqur6vbp9vC5fS6oqjuq6raqev7c+jOr6qbpc6+vqlrU3AAAAAezRZ4JvD/JK7v725KckuS8qjohyflJrunu45NcMz3O9LmzkjwlyWlJ3lBVh0zPdXGSc5IcP32ctsC5AQAADloLi8Duvru7PzLdvzfJrUmOSXJ6ksumzS5LcsZ0//QkV3T3fd39mSR3JDm5qo5Oclh3X9/dneTyuX0AAADYC/vlPYFVtTnJM5J8KMlR3X13MgvFJE+cNjsmyefmdtsxrR0z3V+5DgAAwF5aeARW1dcl+b0kr+juL+5p01XWeg/rq32tc6pqe1Vt37lz594PCwAAcJBbaARW1SMzC8C3dPfbp+XPTy/xzHR7z7S+I8mxc7tvSnLXtL5plfVddPcl3b2lu7ds3Lhx330jAAAAB4lFXh20kvxWklu7+9fnPnVVkq3T/a1JrpxbP6uqHl1Vx2V2AZgbppeM3ltVp0zP+dK5fQAAANgLGxb43M9K8pIkN1XVx6a1f5XkoiTbqursJHcmOTNJuvvmqtqW5JbMrix6Xnc/MO13bpJLkxya5D3TBwAAAHtpYRHY3X+U1d/PlySn7mafC5NcuMr69iQn7rvpAAAAxrRfrg4KAADA+iACAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABiICAQAABrJh2QPAw9l8/ruWPcLSfPaiFyx7BAAADjLOBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxkYRFYVW+uqnuq6pNza0dU1dVVdft0e/jc5y6oqjuq6raqev7c+jOr6qbpc6+vqlrUzAAAAAe7RZ4JvDTJaSvWzk9yTXcfn+Sa6XGq6oQkZyV5yrTPG6rqkGmfi5Ock+T46WPlcwIAALBGC4vA7v5Akr9YsXx6ksum+5clOWNu/Yruvq+7P5PkjiQnV9XRSQ7r7uu7u5NcPrcPAAAAe2l/vyfwqO6+O0mm2ydO68ck+dzcdjumtWOm+yvXV1VV51TV9qravnPnzn06OAAAwMFgvVwYZrX3+fUe1lfV3Zd095bu3rJx48Z9NhwAAMDBYn9H4Oenl3hmur1nWt+R5Ni57TYluWta37TKOgAAAF+D/R2BVyXZOt3fmuTKufWzqurRVXVcZheAuWF6yei9VXXKdFXQl87tAwAAwF7asKgnrqq3Jvm+JEdW1Y4kv5TkoiTbqursJHcmOTNJuvvmqtqW5JYk9yc5r7sfmJ7q3MyuNHpokvdMHwAAAHwNFhaB3f1ju/nUqbvZ/sIkF66yvj3JiftwNAAAgGGtlwvDAAAAsB+IQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIGIQAAAgIFsWPYAwGJsPv9dyx5hKT570QuWPQIAwLrmTCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBARCAAAMBANix7AID1ZPP571r2CEvx2YtesOwRAID9xJlAAACAgYhAAACAgYhAAACAgYhAAACAgRwwEVhVp1XVbVV1R1Wdv+x5AAAADkQHxNVBq+qQJL+Z5PuT7Ejy4aq6qrtvWe5kALiiKgAcWA6UM4EnJ7mjuz/d3V9KckWS05c8EwAAwAHngDgTmOSYJJ+be7wjyT9c0iwA8HfmDCr7i//WgJWqu5c9w8OqqjOTPL+7//n0+CVJTu7ul63Y7pwk50wPvyXJbft10IPTkUn+bNlDsCrHZn1zfNYvx2Z9c3zWL8dmfXN81q9lHpu/390bVy4eKGcCdyQ5du7xpiR3rdyouy9Jcsn+GmoEVbW9u7csew525disb47P+uXYrG+Oz/rl2Kxvjs/6tR6PzYHynsAPJzm+qo6rqkclOSvJVUueCQAA4IBzQJwJ7O77q+qnkvxBkkOSvLm7b17yWAAAAAecAyICk6S7353k3cueY0BeXrt+OTbrm+Ozfjk265vjs345Nuub47N+rbtjc0BcGAYAAIB940B5TyAAAAD7gAhkF1V1bFVdW1W3VtXNVfXyZc/ErqrqkKr6aFW9c9mz8JCqekJVva2qPjX9P/Sdy56Jh1TVz0y/r32yqt5aVY9Z9kwjq6o3V9U9VfXJubUjqurqqrp9uj18mTOOajfH5rXT722fqKp3VNUTljnjyFY7PnOf+7mq6qo6chmzjW53x6aqXlZVt01/Bv37Zc33IBHIau5P8sru/rYkpyQ5r6pOWPJM7OrlSW5d9hDs4j8keW93f2uSp8UxWjeq6pgkP51kS3efmNmFxs5a7lTDuzTJaSvWzk9yTXcfn+Sa6TH736XZ9dhcneTE7n5qkj9OcsH+Hoq/dWl2PT6pqmOTfH+SO/f3QPytS7Pi2FTVc5KcnuSp3f2UJL+6hLm+ighkF919d3d/ZLp/b2Z/iT1muVMxr6o2JXlBkjctexYeUlWHJXl2kt9Kku7+Unf/5XKnYoUNSQ6tqg1JHptVfuYs+093fyDJX6xYPj3JZdP9y5KcsV+HIsnqx6a739fd908PP5jZz21mCXbz/06S/EaSn0/ioh9Lsptjc26Si7r7vmmbe/b7YCuIQPaoqjYneUaSDy13ElZ4XWa/yX9l2YPwVZ6UZGeS/zq9VPdNVfW4ZQ/FTHf/aWb/+npnkruTfKG737fcqVjFUd19dzL7R8kkT1zyPKzunyV5z7KH4CFV9aIkf9rdH1/2LOziyUm+p6o+VFXXVdVJyx5IBLJbVfV1SX4vySu6+4vLnoeZqnphknu6+8Zlz8IuNiT5jiQXd/czkvx1vJRt3ZjeW3Z6kuOSfGOSx1XVjy93KjjwVNWrM3vryFuWPQszVfXYJK9O8ovLnoVVbUhyeGZvs3pVkm1VVcscSASyqqp6ZGYB+Jbufvuy5+GrPCvJi6rqs0muSPLcqvrt5Y7EZEeSHd394Jnzt2UWhawPz0vyme7e2d1fTvL2JN+15JnY1eer6ugkmW6X/rIpHlJVW5O8MMmL288ZW0++ObN/4Pr49PeDTUk+UlV/b6lT8aAdSd7eMzdk9kqupV64RwSyi+lfJn4rya3d/evLnoev1t0XdPem7t6c2UUt/rC7nc1YB7r7/yT5XFV9y7R0apJbljgSX+3OJKdU1WOn3+dOjQv3rEdXJdk63d+a5MolzsKcqjotyS8keVF3/82y5+Eh3X1Tdz+xuzdPfz/YkeQ7pj+XWL7/nuS5SVJVT07yqCR/tsyBRCCreVaSl2R2hulj08cPLnsoOEC8LMlbquoTSZ6e5FeWPA+T6Qzt25J8JMlNmf0ZeMlShxpcVb01yfVJvqWqdlTV2UkuSvL9VXV7Zlc5vGiZM45qN8fmPyV5fJKrp78bvHGpQw5sN8eHdWA3x+bNSZ40/diIK5JsXfaZ9HImHwAAYBzOBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAIAAAxEBAKwdFXVVfVrc49/rqr+9T567kur6of3xXM9zNc5s6puraprV6x/X1W9cy+f6xVV9di5x3+1r+YEABEIwHpwX5Ifqqojlz3IvKo6ZC82PzvJT3b3c/bBl35Fksc+7FZzqmrDPvi6AAxABAKwHtyf2Q9u/5mVn1h5Ju/Bs2LTGbbrqmpbVf1xVV1UVS+uqhuq6qaq+ua5p3leVf3PabsXTvsfUlWvraoPV9UnqupfzD3vtVX1O5n9UPmV8/zY9PyfrKrXTGu/mOS7k7yxql67yvd3WFW9o6puqao3VtUjpv0urqrtVXVzVf3ytPbTSb4xybXzZxWr6sKq+nhVfbCqjpr7tfn1abvXVNXTp89/Yvp6h0/b7W79/VX1G1X1geks5klV9faqur2q/u20zeOq6l3T1/5kVf3oWg4oAOuXCARgvfjNJC+uqq/fi32eluTlSb49yUuSPLm7T07ypiQvm9tuc5LvTfKCzELtMZmduftCd5+U5KQkP1FVx03bn5zk1d19wvwXq6pvTPKaJM9N8vQkJ1XVGd39b5JsT/Li7n7VKnOenOSV05zfnOSHpvVXd/eWJE9N8r1V9dTufn2Su5I8Z+6s4uOSfLC7n5bkA0l+Yu65n5zked39yiSXJ/mF7n5qZgH7S9M2u1tPki9197OTvDHJlUnOS3Jikn9SVd+Q5LQkd3X307r7xCTvXeX7A+AAIgIBWBe6+4uZxcpP78VuH+7uu7v7viT/O8n7pvWbMgu/B23r7q909+1JPp3kW5P8QJKXVtXHknwoyTckOX7a/obu/swqX++kJO/v7p3dfX+StyR59hrmvKG7P93dDyR5a2ZnDZPkR6rqI0k+muQpSU7Yzf5fSvLg+wpvXPG9/W53PzDF8xO6+7pp/bIkz97d+tz+V023NyW5ee7X89NJjp3Wn1dVr6mq7+nuL6zh+wVgHROBAKwnr59ju70AAAHRSURBVMvsDN3j5tbuz/TnVVVVkkfNfe6+uftfmXv8lSTz75HrFV+nk1SSl3X306eP47r7wYj8693MV2v9Rlb5el/1eDrr+HNJTp3O0L0ryWN2s/+Xu/vB53ggX/297W7WtZr/NVv567mhu/84yTMzi8F/N730FYADmAgEYN3o7r9Isi2zEHzQZzOLkCQ5Pckjv4anPrOqHjG9T/BJSW5L8gdJzq2qRyZJVT25qh63pyfJ7Izh91bVkdNFY34syXUPs0+SnFxVx03vBfzRJH+U5LDMAu4L03v8/tHc9vcmefxefH+ZztD936r6nmnpJUmu2936Wp93egns33T3byf51STfsTdzAbD+uJIYAOvNryX5qbnH/yXJlVV1Q5Jr8rWd+bots/A5Ksm/7O7/V1VvyuxllR+ZzjDuTHLGnp6ku++uqguSXJvZWcF3d/eVa/j61ye5KLP3BH4gyTu6+ytV9dEkN2f20sv/Nbf9JUneU1V37+XVRrdm9p7Hx07P+U8fZn0tvj3Ja6vqK0m+nOTcvdgXgHWoHnp1CQAAAAc7LwcFAAAYiAgEAAAYiAgEAAAYiAgEAAAYiAgEAAAYiAgEAAAYiAgEAAAYiAgEAAAYyP8HWwN7/4koG1EAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 1080x720 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.hist(df8.bath,rwidth=0.8)\n",
    "plt.xlabel(\"Number of bathrooms\")\n",
    "plt.ylabel(\"Count\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>location</th>\n",
       "      <th>size</th>\n",
       "      <th>total_sqft</th>\n",
       "      <th>bath</th>\n",
       "      <th>price</th>\n",
       "      <th>bhk</th>\n",
       "      <th>price_per_sqft</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>5277</th>\n",
       "      <td>Neeladri Nagar</td>\n",
       "      <td>10 BHK</td>\n",
       "      <td>4000.0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>160.0</td>\n",
       "      <td>10</td>\n",
       "      <td>4000.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8486</th>\n",
       "      <td>other</td>\n",
       "      <td>10 BHK</td>\n",
       "      <td>12000.0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>525.0</td>\n",
       "      <td>10</td>\n",
       "      <td>4375.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8575</th>\n",
       "      <td>other</td>\n",
       "      <td>16 BHK</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>16.0</td>\n",
       "      <td>550.0</td>\n",
       "      <td>16</td>\n",
       "      <td>5500.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9308</th>\n",
       "      <td>other</td>\n",
       "      <td>11 BHK</td>\n",
       "      <td>6000.0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>11</td>\n",
       "      <td>2500.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9639</th>\n",
       "      <td>other</td>\n",
       "      <td>13 BHK</td>\n",
       "      <td>5425.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>275.0</td>\n",
       "      <td>13</td>\n",
       "      <td>5069.124424</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            location    size  total_sqft  bath  price  bhk  price_per_sqft\n",
       "5277  Neeladri Nagar  10 BHK      4000.0  12.0  160.0   10     4000.000000\n",
       "8486           other  10 BHK     12000.0  12.0  525.0   10     4375.000000\n",
       "8575           other  16 BHK     10000.0  16.0  550.0   16     5500.000000\n",
       "9308           other  11 BHK      6000.0  12.0  150.0   11     2500.000000\n",
       "9639           other  13 BHK      5425.0  13.0  275.0   13     5069.124424"
      ]
     },
     "execution_count": 117,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df8[df8.bath>10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#####  It is unusual to have 2 more bathrooms than number of bedrooms in a home"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>location</th>\n",
       "      <th>size</th>\n",
       "      <th>total_sqft</th>\n",
       "      <th>bath</th>\n",
       "      <th>price</th>\n",
       "      <th>bhk</th>\n",
       "      <th>price_per_sqft</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1626</th>\n",
       "      <td>Chikkabanavar</td>\n",
       "      <td>4 Bedroom</td>\n",
       "      <td>2460.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>80.0</td>\n",
       "      <td>4</td>\n",
       "      <td>3252.032520</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5238</th>\n",
       "      <td>Nagasandra</td>\n",
       "      <td>4 Bedroom</td>\n",
       "      <td>7000.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>450.0</td>\n",
       "      <td>4</td>\n",
       "      <td>6428.571429</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6711</th>\n",
       "      <td>Thanisandra</td>\n",
       "      <td>3 BHK</td>\n",
       "      <td>1806.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>116.0</td>\n",
       "      <td>3</td>\n",
       "      <td>6423.034330</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8411</th>\n",
       "      <td>other</td>\n",
       "      <td>6 BHK</td>\n",
       "      <td>11338.0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>1000.0</td>\n",
       "      <td>6</td>\n",
       "      <td>8819.897689</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           location       size  total_sqft  bath   price  bhk  price_per_sqft\n",
       "1626  Chikkabanavar  4 Bedroom      2460.0   7.0    80.0    4     3252.032520\n",
       "5238     Nagasandra  4 Bedroom      7000.0   8.0   450.0    4     6428.571429\n",
       "6711    Thanisandra      3 BHK      1806.0   6.0   116.0    3     6423.034330\n",
       "8411          other      6 BHK     11338.0   9.0  1000.0    6     8819.897689"
      ]
     },
     "execution_count": 118,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df8[df8.bath>df8.bhk + 2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "metadata": {},
   "outputs": [],
   "source": [
    "df9 = df8[df8.bath < df8.bhk + 2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 120,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(7251, 7)"
      ]
     },
     "execution_count": 120,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df9.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 121,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>location</th>\n",
       "      <th>size</th>\n",
       "      <th>total_sqft</th>\n",
       "      <th>bath</th>\n",
       "      <th>price</th>\n",
       "      <th>bhk</th>\n",
       "      <th>price_per_sqft</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1st Block Jayanagar</td>\n",
       "      <td>4 BHK</td>\n",
       "      <td>2850.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>428.0</td>\n",
       "      <td>4</td>\n",
       "      <td>15017.543860</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1st Block Jayanagar</td>\n",
       "      <td>3 BHK</td>\n",
       "      <td>1630.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>194.0</td>\n",
       "      <td>3</td>\n",
       "      <td>11901.840491</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1st Block Jayanagar</td>\n",
       "      <td>3 BHK</td>\n",
       "      <td>1875.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>235.0</td>\n",
       "      <td>3</td>\n",
       "      <td>12533.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1st Block Jayanagar</td>\n",
       "      <td>3 BHK</td>\n",
       "      <td>1200.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>130.0</td>\n",
       "      <td>3</td>\n",
       "      <td>10833.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1st Block Jayanagar</td>\n",
       "      <td>2 BHK</td>\n",
       "      <td>1235.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>148.0</td>\n",
       "      <td>2</td>\n",
       "      <td>11983.805668</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10232</th>\n",
       "      <td>other</td>\n",
       "      <td>2 BHK</td>\n",
       "      <td>1200.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>70.0</td>\n",
       "      <td>2</td>\n",
       "      <td>5833.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10233</th>\n",
       "      <td>other</td>\n",
       "      <td>1 BHK</td>\n",
       "      <td>1800.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>200.0</td>\n",
       "      <td>1</td>\n",
       "      <td>11111.111111</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10236</th>\n",
       "      <td>other</td>\n",
       "      <td>2 BHK</td>\n",
       "      <td>1353.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>110.0</td>\n",
       "      <td>2</td>\n",
       "      <td>8130.081301</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10237</th>\n",
       "      <td>other</td>\n",
       "      <td>1 Bedroom</td>\n",
       "      <td>812.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>26.0</td>\n",
       "      <td>1</td>\n",
       "      <td>3201.970443</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10240</th>\n",
       "      <td>other</td>\n",
       "      <td>4 BHK</td>\n",
       "      <td>3600.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>400.0</td>\n",
       "      <td>4</td>\n",
       "      <td>11111.111111</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>7251 rows × 7 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                  location       size  total_sqft  bath  price  bhk  \\\n",
       "0      1st Block Jayanagar      4 BHK      2850.0   4.0  428.0    4   \n",
       "1      1st Block Jayanagar      3 BHK      1630.0   3.0  194.0    3   \n",
       "2      1st Block Jayanagar      3 BHK      1875.0   2.0  235.0    3   \n",
       "3      1st Block Jayanagar      3 BHK      1200.0   2.0  130.0    3   \n",
       "4      1st Block Jayanagar      2 BHK      1235.0   2.0  148.0    2   \n",
       "...                    ...        ...         ...   ...    ...  ...   \n",
       "10232                other      2 BHK      1200.0   2.0   70.0    2   \n",
       "10233                other      1 BHK      1800.0   1.0  200.0    1   \n",
       "10236                other      2 BHK      1353.0   2.0  110.0    2   \n",
       "10237                other  1 Bedroom       812.0   1.0   26.0    1   \n",
       "10240                other      4 BHK      3600.0   5.0  400.0    4   \n",
       "\n",
       "       price_per_sqft  \n",
       "0        15017.543860  \n",
       "1        11901.840491  \n",
       "2        12533.333333  \n",
       "3        10833.333333  \n",
       "4        11983.805668  \n",
       "...               ...  \n",
       "10232     5833.333333  \n",
       "10233    11111.111111  \n",
       "10236     8130.081301  \n",
       "10237     3201.970443  \n",
       "10240    11111.111111  \n",
       "\n",
       "[7251 rows x 7 columns]"
      ]
     },
     "execution_count": 121,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df9"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "metadata": {},
   "outputs": [],
   "source": [
    "df10 = df9.drop(['size','price_per_sqft'],axis = 'columns')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>location</th>\n",
       "      <th>total_sqft</th>\n",
       "      <th>bath</th>\n",
       "      <th>price</th>\n",
       "      <th>bhk</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1st Block Jayanagar</td>\n",
       "      <td>2850.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>428.0</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1st Block Jayanagar</td>\n",
       "      <td>1630.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>194.0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1st Block Jayanagar</td>\n",
       "      <td>1875.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>235.0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1st Block Jayanagar</td>\n",
       "      <td>1200.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>130.0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1st Block Jayanagar</td>\n",
       "      <td>1235.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>148.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              location  total_sqft  bath  price  bhk\n",
       "0  1st Block Jayanagar      2850.0   4.0  428.0    4\n",
       "1  1st Block Jayanagar      1630.0   3.0  194.0    3\n",
       "2  1st Block Jayanagar      1875.0   2.0  235.0    3\n",
       "3  1st Block Jayanagar      1200.0   2.0  130.0    3\n",
       "4  1st Block Jayanagar      1235.0   2.0  148.0    2"
      ]
     },
     "execution_count": 123,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df10.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using One Hot Encoding For Location"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###### For categorical variables where no such ordinal relationship exists, the integer encoding is not enough.In fact, using this encoding and allowing the model to assume a natural ordering between categories may result in poor performance or unexpected results (predictions halfway between categories).In this case, a one-hot encoding can be applied to the integer representation. This is where the integer encoded variable is removed and a new binary variable is added for each unique integer value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>1st Block Jayanagar</th>\n",
       "      <th>1st Phase JP Nagar</th>\n",
       "      <th>2nd Phase Judicial Layout</th>\n",
       "      <th>2nd Stage Nagarbhavi</th>\n",
       "      <th>5th Block Hbr Layout</th>\n",
       "      <th>5th Phase JP Nagar</th>\n",
       "      <th>6th Phase JP Nagar</th>\n",
       "      <th>7th Phase JP Nagar</th>\n",
       "      <th>8th Phase JP Nagar</th>\n",
       "      <th>9th Phase JP Nagar</th>\n",
       "      <th>...</th>\n",
       "      <th>Vishveshwarya Layout</th>\n",
       "      <th>Vishwapriya Layout</th>\n",
       "      <th>Vittasandra</th>\n",
       "      <th>Whitefield</th>\n",
       "      <th>Yelachenahalli</th>\n",
       "      <th>Yelahanka</th>\n",
       "      <th>Yelahanka New Town</th>\n",
       "      <th>Yelenahalli</th>\n",
       "      <th>Yeshwanthpur</th>\n",
       "      <th>other</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 242 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   1st Block Jayanagar  1st Phase JP Nagar  2nd Phase Judicial Layout  \\\n",
       "0                    1                   0                          0   \n",
       "1                    1                   0                          0   \n",
       "2                    1                   0                          0   \n",
       "3                    1                   0                          0   \n",
       "4                    1                   0                          0   \n",
       "\n",
       "   2nd Stage Nagarbhavi  5th Block Hbr Layout  5th Phase JP Nagar  \\\n",
       "0                     0                     0                   0   \n",
       "1                     0                     0                   0   \n",
       "2                     0                     0                   0   \n",
       "3                     0                     0                   0   \n",
       "4                     0                     0                   0   \n",
       "\n",
       "   6th Phase JP Nagar  7th Phase JP Nagar  8th Phase JP Nagar  \\\n",
       "0                   0                   0                   0   \n",
       "1                   0                   0                   0   \n",
       "2                   0                   0                   0   \n",
       "3                   0                   0                   0   \n",
       "4                   0                   0                   0   \n",
       "\n",
       "   9th Phase JP Nagar  ...  Vishveshwarya Layout  Vishwapriya Layout  \\\n",
       "0                   0  ...                     0                   0   \n",
       "1                   0  ...                     0                   0   \n",
       "2                   0  ...                     0                   0   \n",
       "3                   0  ...                     0                   0   \n",
       "4                   0  ...                     0                   0   \n",
       "\n",
       "   Vittasandra  Whitefield  Yelachenahalli  Yelahanka  Yelahanka New Town  \\\n",
       "0            0           0               0          0                   0   \n",
       "1            0           0               0          0                   0   \n",
       "2            0           0               0          0                   0   \n",
       "3            0           0               0          0                   0   \n",
       "4            0           0               0          0                   0   \n",
       "\n",
       "   Yelenahalli  Yeshwanthpur  other  \n",
       "0            0             0      0  \n",
       "1            0             0      0  \n",
       "2            0             0      0  \n",
       "3            0             0      0  \n",
       "4            0             0      0  \n",
       "\n",
       "[5 rows x 242 columns]"
      ]
     },
     "execution_count": 124,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dummies = pd.get_dummies(df10.location)\n",
    "dummies.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 125,
   "metadata": {},
   "outputs": [],
   "source": [
    "df11 = pd.concat([df10,dummies],axis = 'columns')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 126,
   "metadata": {},
   "outputs": [],
   "source": [
    "df11 = df11.drop(['other'],axis = 'columns')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 127,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>location</th>\n",
       "      <th>total_sqft</th>\n",
       "      <th>bath</th>\n",
       "      <th>price</th>\n",
       "      <th>bhk</th>\n",
       "      <th>1st Block Jayanagar</th>\n",
       "      <th>1st Phase JP Nagar</th>\n",
       "      <th>2nd Phase Judicial Layout</th>\n",
       "      <th>2nd Stage Nagarbhavi</th>\n",
       "      <th>5th Block Hbr Layout</th>\n",
       "      <th>...</th>\n",
       "      <th>Vijayanagar</th>\n",
       "      <th>Vishveshwarya Layout</th>\n",
       "      <th>Vishwapriya Layout</th>\n",
       "      <th>Vittasandra</th>\n",
       "      <th>Whitefield</th>\n",
       "      <th>Yelachenahalli</th>\n",
       "      <th>Yelahanka</th>\n",
       "      <th>Yelahanka New Town</th>\n",
       "      <th>Yelenahalli</th>\n",
       "      <th>Yeshwanthpur</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1st Block Jayanagar</td>\n",
       "      <td>2850.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>428.0</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1st Block Jayanagar</td>\n",
       "      <td>1630.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>194.0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1st Block Jayanagar</td>\n",
       "      <td>1875.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>235.0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1st Block Jayanagar</td>\n",
       "      <td>1200.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>130.0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1st Block Jayanagar</td>\n",
       "      <td>1235.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>148.0</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 246 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "              location  total_sqft  bath  price  bhk  1st Block Jayanagar  \\\n",
       "0  1st Block Jayanagar      2850.0   4.0  428.0    4                    1   \n",
       "1  1st Block Jayanagar      1630.0   3.0  194.0    3                    1   \n",
       "2  1st Block Jayanagar      1875.0   2.0  235.0    3                    1   \n",
       "3  1st Block Jayanagar      1200.0   2.0  130.0    3                    1   \n",
       "4  1st Block Jayanagar      1235.0   2.0  148.0    2                    1   \n",
       "\n",
       "   1st Phase JP Nagar  2nd Phase Judicial Layout  2nd Stage Nagarbhavi  \\\n",
       "0                   0                          0                     0   \n",
       "1                   0                          0                     0   \n",
       "2                   0                          0                     0   \n",
       "3                   0                          0                     0   \n",
       "4                   0                          0                     0   \n",
       "\n",
       "   5th Block Hbr Layout  ...  Vijayanagar  Vishveshwarya Layout  \\\n",
       "0                     0  ...            0                     0   \n",
       "1                     0  ...            0                     0   \n",
       "2                     0  ...            0                     0   \n",
       "3                     0  ...            0                     0   \n",
       "4                     0  ...            0                     0   \n",
       "\n",
       "   Vishwapriya Layout  Vittasandra  Whitefield  Yelachenahalli  Yelahanka  \\\n",
       "0                   0            0           0               0          0   \n",
       "1                   0            0           0               0          0   \n",
       "2                   0            0           0               0          0   \n",
       "3                   0            0           0               0          0   \n",
       "4                   0            0           0               0          0   \n",
       "\n",
       "   Yelahanka New Town  Yelenahalli  Yeshwanthpur  \n",
       "0                   0            0             0  \n",
       "1                   0            0             0  \n",
       "2                   0            0             0  \n",
       "3                   0            0             0  \n",
       "4                   0            0             0  \n",
       "\n",
       "[5 rows x 246 columns]"
      ]
     },
     "execution_count": 127,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df11.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 128,
   "metadata": {},
   "outputs": [],
   "source": [
    "df12 = df11.drop(['location'],axis = 'columns')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 129,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>total_sqft</th>\n",
       "      <th>bath</th>\n",
       "      <th>price</th>\n",
       "      <th>bhk</th>\n",
       "      <th>1st Block Jayanagar</th>\n",
       "      <th>1st Phase JP Nagar</th>\n",
       "      <th>2nd Phase Judicial Layout</th>\n",
       "      <th>2nd Stage Nagarbhavi</th>\n",
       "      <th>5th Block Hbr Layout</th>\n",
       "      <th>5th Phase JP Nagar</th>\n",
       "      <th>...</th>\n",
       "      <th>Vijayanagar</th>\n",
       "      <th>Vishveshwarya Layout</th>\n",
       "      <th>Vishwapriya Layout</th>\n",
       "      <th>Vittasandra</th>\n",
       "      <th>Whitefield</th>\n",
       "      <th>Yelachenahalli</th>\n",
       "      <th>Yelahanka</th>\n",
       "      <th>Yelahanka New Town</th>\n",
       "      <th>Yelenahalli</th>\n",
       "      <th>Yeshwanthpur</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2850.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>428.0</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1630.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>194.0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1875.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>235.0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1200.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>130.0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1235.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>148.0</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 245 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   total_sqft  bath  price  bhk  1st Block Jayanagar  1st Phase JP Nagar  \\\n",
       "0      2850.0   4.0  428.0    4                    1                   0   \n",
       "1      1630.0   3.0  194.0    3                    1                   0   \n",
       "2      1875.0   2.0  235.0    3                    1                   0   \n",
       "3      1200.0   2.0  130.0    3                    1                   0   \n",
       "4      1235.0   2.0  148.0    2                    1                   0   \n",
       "\n",
       "   2nd Phase Judicial Layout  2nd Stage Nagarbhavi  5th Block Hbr Layout  \\\n",
       "0                          0                     0                     0   \n",
       "1                          0                     0                     0   \n",
       "2                          0                     0                     0   \n",
       "3                          0                     0                     0   \n",
       "4                          0                     0                     0   \n",
       "\n",
       "   5th Phase JP Nagar  ...  Vijayanagar  Vishveshwarya Layout  \\\n",
       "0                   0  ...            0                     0   \n",
       "1                   0  ...            0                     0   \n",
       "2                   0  ...            0                     0   \n",
       "3                   0  ...            0                     0   \n",
       "4                   0  ...            0                     0   \n",
       "\n",
       "   Vishwapriya Layout  Vittasandra  Whitefield  Yelachenahalli  Yelahanka  \\\n",
       "0                   0            0           0               0          0   \n",
       "1                   0            0           0               0          0   \n",
       "2                   0            0           0               0          0   \n",
       "3                   0            0           0               0          0   \n",
       "4                   0            0           0               0          0   \n",
       "\n",
       "   Yelahanka New Town  Yelenahalli  Yeshwanthpur  \n",
       "0                   0            0             0  \n",
       "1                   0            0             0  \n",
       "2                   0            0             0  \n",
       "3                   0            0             0  \n",
       "4                   0            0             0  \n",
       "\n",
       "[5 rows x 245 columns]"
      ]
     },
     "execution_count": 129,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df12.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 130,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(7251, 245)"
      ]
     },
     "execution_count": 130,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df12.shape\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 131,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>total_sqft</th>\n",
       "      <th>bath</th>\n",
       "      <th>bhk</th>\n",
       "      <th>1st Block Jayanagar</th>\n",
       "      <th>1st Phase JP Nagar</th>\n",
       "      <th>2nd Phase Judicial Layout</th>\n",
       "      <th>2nd Stage Nagarbhavi</th>\n",
       "      <th>5th Block Hbr Layout</th>\n",
       "      <th>5th Phase JP Nagar</th>\n",
       "      <th>6th Phase JP Nagar</th>\n",
       "      <th>...</th>\n",
       "      <th>Vijayanagar</th>\n",
       "      <th>Vishveshwarya Layout</th>\n",
       "      <th>Vishwapriya Layout</th>\n",
       "      <th>Vittasandra</th>\n",
       "      <th>Whitefield</th>\n",
       "      <th>Yelachenahalli</th>\n",
       "      <th>Yelahanka</th>\n",
       "      <th>Yelahanka New Town</th>\n",
       "      <th>Yelenahalli</th>\n",
       "      <th>Yeshwanthpur</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2850.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1630.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1875.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1200.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1235.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 244 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   total_sqft  bath  bhk  1st Block Jayanagar  1st Phase JP Nagar  \\\n",
       "0      2850.0   4.0    4                    1                   0   \n",
       "1      1630.0   3.0    3                    1                   0   \n",
       "2      1875.0   2.0    3                    1                   0   \n",
       "3      1200.0   2.0    3                    1                   0   \n",
       "4      1235.0   2.0    2                    1                   0   \n",
       "\n",
       "   2nd Phase Judicial Layout  2nd Stage Nagarbhavi  5th Block Hbr Layout  \\\n",
       "0                          0                     0                     0   \n",
       "1                          0                     0                     0   \n",
       "2                          0                     0                     0   \n",
       "3                          0                     0                     0   \n",
       "4                          0                     0                     0   \n",
       "\n",
       "   5th Phase JP Nagar  6th Phase JP Nagar  ...  Vijayanagar  \\\n",
       "0                   0                   0  ...            0   \n",
       "1                   0                   0  ...            0   \n",
       "2                   0                   0  ...            0   \n",
       "3                   0                   0  ...            0   \n",
       "4                   0                   0  ...            0   \n",
       "\n",
       "   Vishveshwarya Layout  Vishwapriya Layout  Vittasandra  Whitefield  \\\n",
       "0                     0                   0            0           0   \n",
       "1                     0                   0            0           0   \n",
       "2                     0                   0            0           0   \n",
       "3                     0                   0            0           0   \n",
       "4                     0                   0            0           0   \n",
       "\n",
       "   Yelachenahalli  Yelahanka  Yelahanka New Town  Yelenahalli  Yeshwanthpur  \n",
       "0               0          0                   0            0             0  \n",
       "1               0          0                   0            0             0  \n",
       "2               0          0                   0            0             0  \n",
       "3               0          0                   0            0             0  \n",
       "4               0          0                   0            0             0  \n",
       "\n",
       "[5 rows x 244 columns]"
      ]
     },
     "execution_count": 131,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X = df12.drop('price',axis='columns')\n",
    "X.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 132,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0        428.0\n",
       "1        194.0\n",
       "2        235.0\n",
       "3        130.0\n",
       "4        148.0\n",
       "         ...  \n",
       "10232     70.0\n",
       "10233    200.0\n",
       "10236    110.0\n",
       "10237     26.0\n",
       "10240    400.0\n",
       "Name: price, Length: 7251, dtype: float64"
      ]
     },
     "execution_count": 132,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y = df12.price\n",
    "y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 133,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2,random_state = 10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 134,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.8452277697873348"
      ]
     },
     "execution_count": 134,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.linear_model import LinearRegression\n",
    "lr_clf = LinearRegression()\n",
    "lr_clf.fit(X_train,y_train)\n",
    "lr_clf.score(X_test,y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use K Fold cross validation to measure accuracy of our LinearRegression model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this method, we split the data-set into k number of subsets(known as folds) then we perform training on the all the subsets but leave one(k-1) subset for the evaluation of the trained model. In this method, we iterate k times with a different subset reserved for testing purpose each time.\n",
    "Always remember, a lower value of k is more biased, and hence undesirable. On the other hand, a higher value of K is less biased, but can suffer from large variability. It is important to know that a smaller value of k always takes us towards validation set approach, whereas a higher value of k leads to LOOCV approach."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 135,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0.82430186, 0.77166234, 0.85089567, 0.80837764, 0.83653286])"
      ]
     },
     "execution_count": 135,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.model_selection import ShuffleSplit\n",
    "from sklearn.model_selection import cross_val_score\n",
    "cv = ShuffleSplit(n_splits = 5, test_size = 0.2, random_state = 0)\n",
    "cross_val_score(LinearRegression(),X,y,cv=cv)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Find best model using GridSearchCV "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 136,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>model</th>\n",
       "      <th>best_score</th>\n",
       "      <th>best_params</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>linear_regression</td>\n",
       "      <td>0.818354</td>\n",
       "      <td>{'normalize': False}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>lasso</td>\n",
       "      <td>0.687430</td>\n",
       "      <td>{'alpha': 2, 'selection': 'random'}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>decision_tree</td>\n",
       "      <td>0.720273</td>\n",
       "      <td>{'criterion': 'friedman_mse', 'splitter': 'best'}</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               model  best_score  \\\n",
       "0  linear_regression    0.818354   \n",
       "1              lasso    0.687430   \n",
       "2      decision_tree    0.720273   \n",
       "\n",
       "                                         best_params  \n",
       "0                               {'normalize': False}  \n",
       "1                {'alpha': 2, 'selection': 'random'}  \n",
       "2  {'criterion': 'friedman_mse', 'splitter': 'best'}  "
      ]
     },
     "execution_count": 136,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.model_selection import GridSearchCV\n",
    "from sklearn.linear_model import Lasso\n",
    "from sklearn.tree import DecisionTreeRegressor\n",
    "\n",
    "def find_best_model_using_gridsearchcv(X,y):\n",
    "    algos = {\n",
    "        'linear_regression' : {\n",
    "            'model': LinearRegression(),\n",
    "            'params': {\n",
    "                'normalize': [True, False]\n",
    "            }\n",
    "        },\n",
    "        'lasso': {\n",
    "            'model': Lasso(),\n",
    "            'params': {\n",
    "                'alpha': [1,2],\n",
    "                'selection': ['random', 'cyclic']\n",
    "            }\n",
    "        },\n",
    "        'decision_tree': {\n",
    "            'model': DecisionTreeRegressor(),\n",
    "            'params': {\n",
    "                'criterion' : ['mse','friedman_mse'],\n",
    "                'splitter': ['best','random']\n",
    "            }\n",
    "        }\n",
    "    }\n",
    "    scores = []\n",
    "    cv = ShuffleSplit(n_splits=5, test_size=0.2, random_state=0)\n",
    "    for algo_name, config in algos.items():\n",
    "        gs =  GridSearchCV(config['model'], config['params'], cv=cv, return_train_score=False)\n",
    "        gs.fit(X,y)\n",
    "        scores.append({\n",
    "            'model': algo_name,\n",
    "            'best_score': gs.best_score_,\n",
    "            'best_params': gs.best_params_\n",
    "        })\n",
    "\n",
    "    return pd.DataFrame(scores,columns=['model','best_score','best_params'])\n",
    "\n",
    "find_best_model_using_gridsearchcv(X,y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Based on above results we can say that LinearRegression gives the best score. Hence we will use that. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Test the model for few properties "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 137,
   "metadata": {},
   "outputs": [],
   "source": [
    "def predict_price(location,sqft,bath,bhk):\n",
    "    loc_index = np.where(X.columns==location)[0][0]\n",
    "    \n",
    "    x = np.zeros(len(X.columns))\n",
    "    x[0] = sqft\n",
    "    x[1] = bath\n",
    "    x[2] = bhk\n",
    "    if loc_index >= 0:\n",
    "        x[loc_index] = 1\n",
    "    return lr_clf.predict([x])[0]    \n",
    "      "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 138,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "83.49904676591962"
      ]
     },
     "execution_count": 138,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "predict_price('1st Phase JP Nagar',1000,2,2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 139,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "184.58430202040012"
      ]
     },
     "execution_count": 139,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "predict_price('Indira Nagar',1000, 3, 3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
关于此算法

使用 k 折交叉验证和 GridSearchCV 寻找房价预测的最佳机器学习算法。

在机器学习中,我们不能仅在训练数据上拟合模型,也不能断言模型对真实数据会准确工作。为此,我们必须确保我们的模型从数据中获得了正确的模式,并且没有过度拟合噪声。出于此目的,我们使用交叉验证技术。交叉验证是一种技术,我们使用数据集的子集训练模型,然后使用数据集的补充子集进行评估。

import pandas as pd
import numpy as np

from matplotlib import pyplot as plt
%matplotlib inline 
import matplotlib
matplotlib.rcParams["figure.figsize"]=(20,10)
df1 = pd.read_csv("Bengaluru_House_Data.csv")
df1.head()
区域类型 可用性 位置 尺寸 小区 总面积(平方英尺) 浴室数量 阳台数量 价格
0 超高层建筑面积 12月19日 电子城二期 2 居室 Coomee 1056 2.0 1.0 39.07
1 占地面积 现房 Chikka Tirupathi 4 居室 Theanmp 2600 5.0 3.0 120.00
2 建筑面积 现房 Uttarahalli 3 居室 NaN 1440 2.0 3.0 62.00
3 超高层建筑面积 现房 Lingadheeranahalli 3 居室 Soiewre 1521 3.0 1.0 95.00
4 超高层建筑面积 现房 Kothanur 2 居室 NaN 1200 2.0 1.0 51.00
df1.groupby('area_type')['area_type'].agg('count')
area_type
Built-up  Area          2418
Carpet  Area              87
Plot  Area              2025
Super built-up  Area    8790
Name: area_type, dtype: int64
df2 = df1.drop(['area_type','society','balcony','availability'] , axis="columns")
df2.head()
位置 尺寸 总面积(平方英尺) 浴室数量 价格
0 电子城二期 2 居室 1056 2.0 39.07
1 Chikka Tirupathi 4 居室 2600 5.0 120.00
2 Uttarahalli 3 居室 1440 2.0 62.00
3 Lingadheeranahalli 3 居室 1521 3.0 95.00
4 Kothanur 2 居室 1200 2.0 51.00

数据清洗:处理 NA/空值

df2.isnull().sum()
location       1
size          16
total_sqft     0
bath          73
price          0
dtype: int64
df3 = df2.dropna()
df3.isnull().sum()
location      0
size          0
total_sqft    0
bath          0
price         0
dtype: int64
df2['size'].unique()
array([&#x27;2 BHK&#x27;, &#x27;4 Bedroom&#x27;, &#x27;3 BHK&#x27;, &#x27;4 BHK&#x27;, &#x27;6 Bedroom&#x27;, &#x27;3 Bedroom&#x27;,
       &#x27;1 BHK&#x27;, &#x27;1 RK&#x27;, &#x27;1 Bedroom&#x27;, &#x27;8 Bedroom&#x27;, &#x27;2 Bedroom&#x27;,
       &#x27;7 Bedroom&#x27;, &#x27;5 BHK&#x27;, &#x27;7 BHK&#x27;, &#x27;6 BHK&#x27;, &#x27;5 Bedroom&#x27;, &#x27;11 BHK&#x27;,
       &#x27;9 BHK&#x27;, nan, &#x27;9 Bedroom&#x27;, &#x27;27 BHK&#x27;, &#x27;10 Bedroom&#x27;, &#x27;11 Bedroom&#x27;,
       &#x27;10 BHK&#x27;, &#x27;19 BHK&#x27;, &#x27;16 BHK&#x27;, &#x27;43 Bedroom&#x27;, &#x27;14 BHK&#x27;, &#x27;8 BHK&#x27;,
       &#x27;12 Bedroom&#x27;, &#x27;13 BHK&#x27;, &#x27;18 Bedroom&#x27;], dtype=object)

特征工程

特征工程是利用领域知识,通过数据挖掘技术从原始数据中提取特征的过程。这些特征可用于提高机器学习算法的性能。特征工程可以被认为是应用机器学习本身。

df3['bhk'] = df3['size'].apply(lambda x: int(x.split(' ')[0]))
&amp;lt;ipython-input-81-4c4c73fbe7f4&amp;gt;:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df3[&#x27;bhk&#x27;] = df3[&#x27;size&#x27;].apply(lambda x: int(x.split(&#x27; &#x27;)[0]))
df3['bhk'].unique()
array([ 2,  4,  3,  6,  1,  8,  7,  5, 11,  9, 27, 10, 19, 16, 43, 14, 12,
       13, 18], dtype=int64)
df3[df3.bhk&gt;20]
位置 尺寸 总面积(平方英尺) 浴室数量 价格 卧室数量
1718 2 电子城二期 27 居室 8000 27.0 230.0 27
4684 Munnekollal 43 居室 2400 40.0 660.0 43
df3.total_sqft.unique()
array([&#x27;1056&#x27;, &#x27;2600&#x27;, &#x27;1440&#x27;, ..., &#x27;1133 - 1384&#x27;, &#x27;774&#x27;, &#x27;4689&#x27;],
      dtype=object)
def is_float(x):
    try:
        float(x)
    except:
        return False
    return True
df3[~df3['total_sqft'].apply(is_float)].head(10)
位置 尺寸 总面积(平方英尺) 浴室数量 价格 卧室数量
30 Yelahanka 4 居室 2100 - 2850 4.0 186.000 4
122 Hebbal 4 居室 3067 - 8156 4.0 477.000 4
137 8 期 JP Nagar 2 居室 1042 - 1105 2.0 54.005 2
165 Sarjapur 2 居室 1145 - 1340 2.0 43.490 2
188 KR Puram 2 居室 1015 - 1540 2.0 56.800 2
410 Kengeri 1 居室 34.46 平方米 1.0 18.500 1
549 Hennur Road 2 居室 1195 - 1440 2.0 63.770 2
648 Arekere 9 居室 4125 英亩 9.0 265.000 9
661 Yelahanka 2 居室 1120 - 1145 2.0 48.130 2
672 Bettahalsoor 4 居室 3090 - 5002 4.0 445.000 4
上面显示,total_sqft 可以是一个范围(例如 2100-2850)。对于这种情况,我们只需取范围内的最小值和最大值的平均值即可。还有其他情况,例如 34.46 平方米,可以使用单位换算转换为平方英尺。为了简单起见,我将直接删除此类极端情况。
def convert_sqft_to_num(x):
    tokens = x.split('-')
    if len(tokens) == 2:
        return (float(tokens[0]) + float(tokens[1]))/2
    try:
        return float(x)
    except:
        return None
        
df4 = df3.copy()
df4['total_sqft'] = df4['total_sqft'].apply(convert_sqft_to_num)
df5 = df4.copy()
添加一个名为每平方英尺价格的新特征
df5['price_per_sqft'] = df5['price']*100000/df5['total_sqft']
len(df5.location.unique())
1304

检查位置,这是一个分类变量。我们需要在这里应用降维技术来减少位置的数量。

df5.location = df5.location.apply(lambda x: x.strip())
location_stats = df5.groupby('location')['location'].agg('count').sort_values(ascending=False)
location_stats
location
Whitefield           535
Sarjapur  Road       392
Electronic City      304
Kanakpura Road       266
Thanisandra          236
                    ... 
LIC Colony             1
Kuvempu Layout         1
Kumbhena Agrahara      1
Kudlu Village,         1
1 Annasandrapalya      1
Name: location, Length: 1293, dtype: int64
len(location_stats[location_stats&lt;=10])
1052

降维

任何数据点少于 10 个的位置都应标记为“其他”位置。这样,类别数量可以大幅减少。稍后,当我们进行独热编码时,这将有助于我们拥有更少的虚拟列。

location_stats_less_10 = location_stats[location_stats&lt;=10]
location_stats_less_10
location
BTM 1st Stage          10
Basapura               10
Sector 1 HSR Layout    10
Naganathapura          10
Kalkere                10
                       ..
LIC Colony              1
Kuvempu Layout          1
Kumbhena Agrahara       1
Kudlu Village,          1
1 Annasandrapalya       1
Name: location, Length: 1052, dtype: int64
len(df5.location.unique())
1293
df5.location = df5.location.apply(lambda x: 'other' if x in location_stats_less_10 else x)
len(df5.location.unique())
242
df5.head()
位置 尺寸 总面积(平方英尺) 浴室数量 价格 卧室数量 每平方英尺价格
0 电子城二期 2 居室 1056.0 2.0 39.07 2 3699.810606
1 Chikka Tirupathi 4 居室 2600.0 5.0 120.00 4 4615.384615
2 Uttarahalli 3 居室 1440.0 2.0 62.00 3 4305.555556
3 Lingadheeranahalli 3 居室 1521.0 3.0 95.00 3 6245.890861
4 Kothanur 2 居室 1200.0 2.0 51.00 2 4250.000000

使用业务逻辑去除异常值

df5[df5.total_sqft/df5.bhk&lt;300].head()
位置 尺寸 总面积(平方英尺) 浴室数量 价格 卧室数量 每平方英尺价格
9 其他 6 居室 1020.0 6.0 370.0 6 36274.509804
45 HSR 布局 8 居室 600.0 9.0 200.0 8 33333.333333
58 Murugeshpalya 6 居室 1407.0 4.0 150.0 6 10660.980810
68 Devarachikkanahalli 8 居室 1350.0 7.0 85.0 8 6296.296296
70 其他 3 居室 500.0 3.0 100.0 3 20000.000000
检查上面的数据点。我们有 1020 平方英尺的 6 居室公寓。另一个是 8 居室,总面积为 600 平方英尺。这些都是明显的数据错误,可以安全地删除。
df5.shape
(13246, 7)
df6 = df5[~(df5.total_sqft/df5.bhk&lt;300)]
df6.shape
(12502, 7)

使用标准差和均值去除异常值

df6.price_per_sqft.describe()
count     12456.000000
mean       6308.502826
std        4168.127339
min         267.829813
25%        4210.526316
50%        5294.117647
75%        6916.666667
max      176470.588235
Name: price_per_sqft, dtype: float64
def remove_pps_outliers(df):
    df_out = pd.DataFrame()
    for key, subdf in df.groupby('location'):
        m = np.mean(subdf.price_per_sqft)
        st = np.std(subdf.price_per_sqft)
        reduced_df = subdf[(subdf.price_per_sqft&gt;(m-st)) & (subdf.price_per_sqft&lt;=(m+st))]
        df_out = pd.concat([df_out,reduced_df],ignore_index=True)
    return df_out    
df7 = remove_pps_outliers(df6)
df7.shape
(10241, 7)
def plot_scatter_chart(df,location):
    bhk2 = df[(df.location==location) & (df.bhk==2)]
    bhk3 = df[(df.location==location) & (df.bhk==3)]
    matplotlib.rcParams['figure.figsize'] = (15,10)
    plt.scatter(bhk2.total_sqft,bhk2.price,color='blue',label='2 BHK', s=50)
    plt.scatter(bhk3.total_sqft,bhk3.price,marker='+', color='green',label='3 BHK', s=50)
    plt.xlabel("Total Square Feet Area")
    plt.ylabel("Price (Lakh Indian Rupees)")
    plt.title(location)
    plt.legend()
    
plot_scatter_chart(df7,"Rajaji Nagar")
plot_scatter_chart(df7,"Hebbal")
def remove_bhk_outliers(df):
    exclude_indices = np.array([])
    for location,location_df in df.groupby('location'):
        bhk_stats = {}
        for bhk,bhk_df in location_df.groupby('bhk'):
            bhk_stats[bhk] = {
                'mean':np.mean(bhk_df.price_per_sqft),
                'std':np.std(bhk_df.price_per_sqft),
                'count':bhk_df.shape[0]
            }
        for bhk,bhk_df in location_df.groupby("bhk"):
            stats = bhk_stats.get(bhk-1)
            if stats and stats['count']&gt;5:
                exclude_indices = np.append(exclude_indices,bhk_df[bhk_df.price_per_sqft &lt; (stats['mean'])].index.values)
    return df.drop(exclude_indices,axis='index')       
df8 = remove_bhk_outliers(df7)
df8.shape
(7329, 7)

使用浴室特征去除异常值

df8.bath.unique()
array([ 4.,  3.,  2.,  5.,  8.,  1.,  6.,  7.,  9., 12., 16., 13.])
plt.hist(df8.bath,rwidth=0.8)
plt.xlabel("Number of bathrooms")
plt.ylabel("Count")
Text(0, 0.5, &#x27;Count&#x27;)
df8[df8.bath&gt;10]
位置 尺寸 总面积(平方英尺) 浴室数量 价格 卧室数量 每平方英尺价格
5277 Neeladri Nagar 10 居室 4000.0 12.0 160.0 10 4000.000000
8486 其他 10 居室 12000.0 12.0 525.0 10 4375.000000
8575 其他 16 居室 10000.0 16.0 550.0 16 5500.000000
9308 其他 11 居室 6000.0 12.0 150.0 11 2500.000000
9639 其他 13 居室 5425.0 13.0 275.0 13 5069.124424
住宅中浴室数量比卧室数量多 2 个以上是不寻常的。
df8[df8.bath&gt;df8.bhk + 2]
位置 尺寸 总面积(平方英尺) 浴室数量 价格 卧室数量 每平方英尺价格
1626 Chikkabanavar 4 居室 2460.0 7.0 80.0 4 3252.032520
5238 Nagasandra 4 居室 7000.0 8.0 450.0 4 6428.571429
6711 Thanisandra 3 居室 1806.0 6.0 116.0 3 6423.034330
8411 其他 6 居室 11338.0 9.0 1000.0 6 8819.897689
df9 = df8[df8.bath &lt; df8.bhk + 2]
df9.shape
(7251, 7)
df9
位置 尺寸 总面积(平方英尺) 浴室数量 价格 卧室数量 每平方英尺价格
0 第一街 Jayanagar 4 居室 2850.0 4.0 428.0 4 15017.543860
1 第一街 Jayanagar 3 居室 1630.0 3.0 194.0 3 11901.840491
2 第一街 Jayanagar 3 居室 1875.0 2.0 235.0 3 12533.333333
3 第一街 Jayanagar 3 居室 1200.0 2.0 130.0 3 10833.333333
4 第一街 Jayanagar 2 居室 1235.0 2.0 148.0 2 11983.805668
... ... ... ... ... ... ... ...
10232 其他 2 居室 1200.0 2.0 70.0 2 5833.333333
10233 其他 1 居室 1800.0 1.0 200.0 1 11111.111111
10236 其他 2 居室 1353.0 2.0 110.0 2 8130.081301
10237 其他 1 居室 812.0 1.0 26.0 1 3201.970443
10240 其他 4 居室 3600.0 5.0 400.0 4 11111.111111

7251 行 × 7 列

df10 = df9.drop(['size','price_per_sqft'],axis = 'columns')
df10.head()
位置 总面积(平方英尺) 浴室数量 价格 卧室数量
0 第一街 Jayanagar 2850.0 4.0 428.0 4
1 第一街 Jayanagar 1630.0 3.0 194.0 3
2 第一街 Jayanagar 1875.0 2.0 235.0 3
3 第一街 Jayanagar 1200.0 2.0 130.0 3
4 第一街 Jayanagar 1235.0 2.0 148.0 2

对位置使用独热编码

对于不存在此类序数关系的分类变量,整数编码是不够的。事实上,使用此编码并允许模型假设类别之间的自然顺序可能会导致性能下降或出现意外结果(预测介于类别之间)。在这种情况下,可以对整数表示应用独热编码。即将整数编码的变量删除,并为每个唯一的整数值添加一个新的二进制变量。
dummies = pd.get_dummies(df10.location)
dummies.head()
第一街 Jayanagar 第一期 JP Nagar 第二期司法布局 第二阶段 Nagarbhavi 第五街 Hbr 布局 第五期 JP Nagar 第六期 JP Nagar 第七期 JP Nagar 8 期 JP Nagar 第九期 JP Nagar ... Vishveshwarya 布局 Vishwapriya 布局 Vittasandra Whitefield Yelachenahalli Yelahanka Yelahanka 新镇 Yelenahalli Yeshwanthpur 其他
0 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 行 × 242 列

df11 = pd.concat([df10,dummies],axis = 'columns')
df11 = df11.drop(['other'],axis = 'columns')
df11.head()
位置 总面积(平方英尺) 浴室数量 价格 卧室数量 第一街 Jayanagar 第一期 JP Nagar 第二期司法布局 第二阶段 Nagarbhavi 第五街 Hbr 布局 ... Vijayanagar Vishveshwarya 布局 Vishwapriya 布局 Vittasandra Whitefield Yelachenahalli Yelahanka Yelahanka 新镇 Yelenahalli Yeshwanthpur
0 第一街 Jayanagar 2850.0 4.0 428.0 4 1 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 第一街 Jayanagar 1630.0 3.0 194.0 3 1 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 第一街 Jayanagar 1875.0 2.0 235.0 3 1 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 第一街 Jayanagar 1200.0 2.0 130.0 3 1 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 第一街 Jayanagar 1235.0 2.0 148.0 2 1 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 行 × 246 列

df12 = df11.drop(['location'],axis = 'columns')
df12.head()
总面积(平方英尺) 浴室数量 价格 卧室数量 第一街 Jayanagar 第一期 JP Nagar 第二期司法布局 第二阶段 Nagarbhavi 第五街 Hbr 布局 第五期 JP Nagar ... Vijayanagar Vishveshwarya 布局 Vishwapriya 布局 Vittasandra Whitefield Yelachenahalli Yelahanka Yelahanka 新镇 Yelenahalli Yeshwanthpur
0 2850.0 4.0 428.0 4 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 1630.0 3.0 194.0 3 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 1875.0 2.0 235.0 3 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 1200.0 2.0 130.0 3 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 1235.0 2.0 148.0 2 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 行 × 245 列

df12.shape
(7251, 245)
X = df12.drop('price',axis='columns')
X.head()
总面积(平方英尺) 浴室数量 卧室数量 第一街 Jayanagar 第一期 JP Nagar 第二期司法布局 第二阶段 Nagarbhavi 第五街 Hbr 布局 第五期 JP Nagar 第六期 JP Nagar ... Vijayanagar Vishveshwarya 布局 Vishwapriya 布局 Vittasandra Whitefield Yelachenahalli Yelahanka Yelahanka 新镇 Yelenahalli Yeshwanthpur
0 2850.0 4.0 4 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 1630.0 3.0 3 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 1875.0 2.0 3 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 1200.0 2.0 3 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 1235.0 2.0 2 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 行 × 244 列

y = df12.price
y
0        428.0
1        194.0
2        235.0
3        130.0
4        148.0
         ...  
10232     70.0
10233    200.0
10236    110.0
10237     26.0
10240    400.0
Name: price, Length: 7251, dtype: float64
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2,random_state = 10)
from sklearn.linear_model import LinearRegression
lr_clf = LinearRegression()
lr_clf.fit(X_train,y_train)
lr_clf.score(X_test,y_test)
0.8452277697873348

使用 K 折交叉验证来衡量线性回归模型的准确性

在这种方法中,我们将数据集分成 k 个子集(称为折叠),然后我们对所有子集进行训练,但保留一个(k-1)子集用于评估训练后的模型。在这种方法中,我们迭代 k 次,每次都保留一个不同的子集用于测试目的。始终记住,较小的 k 值偏差更大,因此不可取。另一方面,较大的 K 值偏差较小,但可能存在较大的可变性。重要的是要知道,较小的 k 值始终使我们走向验证集方法,而较大的 k 值会导致 LOOCV 方法。

from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import cross_val_score
cv = ShuffleSplit(n_splits = 5, test_size = 0.2, random_state = 0)
cross_val_score(LinearRegression(),X,y,cv=cv)
array([0.82430186, 0.77166234, 0.85089567, 0.80837764, 0.83653286])

使用 GridSearchCV 查找最佳模型

from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import Lasso
from sklearn.tree import DecisionTreeRegressor

def find_best_model_using_gridsearchcv(X,y):
    algos = {
        'linear_regression' : {
            'model': LinearRegression(),
            'params': {
                'normalize': [True, False]
            }
        },
        'lasso': {
            'model': Lasso(),
            'params': {
                'alpha': [1,2],
                'selection': ['random', 'cyclic']
            }
        },
        'decision_tree': {
            'model': DecisionTreeRegressor(),
            'params': {
                'criterion' : ['mse','friedman_mse'],
                'splitter': ['best','random']
            }
        }
    }
    scores = []
    cv = ShuffleSplit(n_splits=5, test_size=0.2, random_state=0)
    for algo_name, config in algos.items():
        gs =  GridSearchCV(config['model'], config['params'], cv=cv, return_train_score=False)
        gs.fit(X,y)
        scores.append({
            'model': algo_name,
            'best_score': gs.best_score_,
            'best_params': gs.best_params_
        })

    return pd.DataFrame(scores,columns=['model','best_score','best_params'])

find_best_model_using_gridsearchcv(X,y)
模型 最佳得分 最佳参数
0 线性回归 0.818354 {'normalize': False}
1 套索回归 0.687430 {'alpha': 2, 'selection': 'random'}
2 决策树 0.720273 {'criterion': 'friedman_mse', 'splitter': 'best'}

根据以上结果,我们可以说线性回归给出了最佳得分。因此,我们将使用它。

测试一些房产的模型

def predict_price(location,sqft,bath,bhk):
    loc_index = np.where(X.columns==location)[0][0]
    
    x = np.zeros(len(X.columns))
    x[0] = sqft
    x[1] = bath
    x[2] = bhk
    if loc_index &gt;= 0:
        x[loc_index] = 1
    return lr_clf.predict([x])[0]    
      
predict_price('1st Phase JP Nagar',1000,2,2)
83.49904676591962
predict_price('Indira Nagar',1000, 3, 3)
184.58430202040012