{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## This notebook is part of the Spark training delivered by CERN IT\n", "### Regression with spark.ml\n", "Contact: Luca.Canali@cern.ch\n", "\n", "This notebook is an implementation of a regression system trained using `spark.ml` to predict house prices.\n", "\n", "The data used for this exercise is the \"California Housing Prices dataset\" from the StatLib repository, originally featured in the following paper: Pace, R. Kelley, and Ronald Barry. \"Sparse spatial autoregressions.\" Statistics & Probability Letters 33.3 (1997): 291-297.\n", "The code and steps we follow in this notebook are inspired by the book \"Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, Aurelien Geron, 2nd Edition\".\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run this notebook from Jupyter with Python kernel\n", "- When using on CERN SWAN, do not attach the notebook to a Spark cluster, but rather run locally on the SWAN container\n", "- If running this outside CERN SWAN, plese make sure to tha PySpark installed: `pip install pyspark`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create the Spark session and read the data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\n", "# Local mode: run this when using CERN SWAN not connected to a cluster \n", "# or run it on a private Jupyter notebook instance\n", "# Dependency: PySpark (use SWAN or pip install pyspark)\n", "#\n", "\n", "from pyspark.sql import SparkSession\n", "spark = SparkSession.builder \\\n", " .master(\"local[*]\") \\\n", " .appName(\"ML HandsOn Regression\") \\\n", " .config(\"spark.driver.memory\",\"4g\") \\\n", " .config(\"spark.ui.showConsoleProgress\", \"false\") \\\n", " .getOrCreate()" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "

SparkSession - in-memory

\n", " \n", "
\n", "

SparkContext

\n", "\n", "

Spark UI

\n", "\n", "
\n", "
Version
\n", "
v3.3.1
\n", "
Master
\n", "
local[*]
\n", "
AppName
\n", "
ML HandsOn Regression
\n", "
\n", "
\n", " \n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "spark" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Local mode: read the data locally from the cloned repo\n", "df = (spark.read\n", " .format(\"csv\")\n", " .option(\"header\",\"true\")\n", " .option(\"inferschema\",\"true\")\n", " .load(\"../data/housing.csv.gz\")\n", " )\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Split data into a training and test datasets" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "16526" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train, test = df.randomSplit([0.8, 0.2], 4242)\n", "\n", "# cache the training dataset\n", "train.cache().count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic data exploration" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "root\n", " |-- longitude: double (nullable = true)\n", " |-- latitude: double (nullable = true)\n", " |-- housing_median_age: double (nullable = true)\n", " |-- total_rooms: double (nullable = true)\n", " |-- total_bedrooms: double (nullable = true)\n", " |-- population: double (nullable = true)\n", " |-- households: double (nullable = true)\n", " |-- median_income: double (nullable = true)\n", " |-- median_house_value: double (nullable = true)\n", " |-- ocean_proximity: string (nullable = true)\n", "\n" ] } ], "source": [ "train.printSchema()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximity
0-124.3540.5452.01820.0300.0806.0270.03.014794600.0NEAR OCEAN
1-124.3041.8019.02672.0552.01298.0478.01.979785800.0NEAR OCEAN
2-124.3041.8417.02677.0531.01244.0456.03.0313103600.0NEAR OCEAN
3-124.2740.6936.02349.0528.01194.0465.02.517979000.0NEAR OCEAN
4-124.2640.5852.02217.0394.0907.0369.02.3571111400.0NEAR OCEAN
5-124.2540.2832.01430.0419.0434.0187.01.941776100.0NEAR OCEAN
6-124.2340.5452.02694.0453.01152.0435.03.0806106700.0NEAR OCEAN
7-124.2341.7511.03159.0616.01343.0479.02.480573200.0NEAR OCEAN
8-124.2241.7328.03003.0699.01530.0653.01.703878300.0NEAR OCEAN
9-124.2140.7532.01218.0331.0620.0268.01.652858100.0NEAR OCEAN
\n", "
" ], "text/plain": [ " longitude latitude housing_median_age total_rooms total_bedrooms \\\n", "0 -124.35 40.54 52.0 1820.0 300.0 \n", "1 -124.30 41.80 19.0 2672.0 552.0 \n", "2 -124.30 41.84 17.0 2677.0 531.0 \n", "3 -124.27 40.69 36.0 2349.0 528.0 \n", "4 -124.26 40.58 52.0 2217.0 394.0 \n", "5 -124.25 40.28 32.0 1430.0 419.0 \n", "6 -124.23 40.54 52.0 2694.0 453.0 \n", "7 -124.23 41.75 11.0 3159.0 616.0 \n", "8 -124.22 41.73 28.0 3003.0 699.0 \n", "9 -124.21 40.75 32.0 1218.0 331.0 \n", "\n", " population households median_income median_house_value ocean_proximity \n", "0 806.0 270.0 3.0147 94600.0 NEAR OCEAN \n", "1 1298.0 478.0 1.9797 85800.0 NEAR OCEAN \n", "2 1244.0 456.0 3.0313 103600.0 NEAR OCEAN \n", "3 1194.0 465.0 2.5179 79000.0 NEAR OCEAN \n", "4 907.0 369.0 2.3571 111400.0 NEAR OCEAN \n", "5 434.0 187.0 1.9417 76100.0 NEAR OCEAN \n", "6 1152.0 435.0 3.0806 106700.0 NEAR OCEAN \n", "7 1343.0 479.0 2.4805 73200.0 NEAR OCEAN \n", "8 1530.0 653.0 1.7038 78300.0 NEAR OCEAN \n", "9 620.0 268.0 1.6528 58100.0 NEAR OCEAN " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The dataset reports housing prices in California from 1990s\n", "train.limit(10).toPandas()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+---------------+--------+\n", "|ocean_proximity|count(1)|\n", "+---------------+--------+\n", "| ISLAND| 3|\n", "| NEAR OCEAN| 2133|\n", "| NEAR BAY| 1860|\n", "| <1H OCEAN| 7298|\n", "| INLAND| 5232|\n", "+---------------+--------+\n", "\n" ] } ], "source": [ "train.createOrReplaceTempView(\"train\")\n", "spark.sql(\"select ocean_proximity, count(*) from train group by ocean_proximity\").show()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+--------+\n", "|count(1)|\n", "+--------+\n", "| 175|\n", "+--------+\n", "\n" ] } ], "source": [ "# the are some missing data in the total_bedrooms feature (i.e. there are null values)\n", "\n", "spark.sql(\"select count(*) from train where total_bedrooms is null\").show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Feature preparation" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "from pyspark.ml.feature import StringIndexer,OneHotEncoder,VectorIndexer,Imputer,VectorAssembler, StandardScaler\n", "from pyspark.ml import Pipeline" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# Transform ocean_proximity feature in a one-hot encoded feature \n", "ocean_index = StringIndexer(inputCol=\"ocean_proximity\",outputCol=\"indexed_ocean_proximity\")\n", "ocean_onehot = OneHotEncoder(inputCol=\"indexed_ocean_proximity\",outputCol=\"oh_ocean_proximity\",dropLast=False)\n", "\n", "# Add missing data to the total_bedrooms feature, by using estimation.\n", "imputer_tot_br = Imputer(strategy='median',inputCols=[\"total_bedrooms\"],outputCols=[\"total_bedrooms_filled\"])\n", "\n", "features = [\"longitude\", \"latitude\", \"housing_median_age\", \n", " \"total_rooms\", \"population\", \"households\", \n", " \"median_income\", \"total_bedrooms_filled\"]\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Build a pipeline, bundling the feature preparation steps" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "feature_preparation_pipeline = Pipeline(stages=[ocean_index,ocean_onehot,imputer_tot_br])" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximityindexed_ocean_proximityoh_ocean_proximitytotal_bedrooms_filled
0-124.3540.5452.01820.0300.0806.0270.03.014794600.0NEAR OCEAN2.0(0.0, 0.0, 1.0, 0.0, 0.0)300.0
1-124.3041.8019.02672.0552.01298.0478.01.979785800.0NEAR OCEAN2.0(0.0, 0.0, 1.0, 0.0, 0.0)552.0
2-124.3041.8417.02677.0531.01244.0456.03.0313103600.0NEAR OCEAN2.0(0.0, 0.0, 1.0, 0.0, 0.0)531.0
3-124.2740.6936.02349.0528.01194.0465.02.517979000.0NEAR OCEAN2.0(0.0, 0.0, 1.0, 0.0, 0.0)528.0
4-124.2640.5852.02217.0394.0907.0369.02.3571111400.0NEAR OCEAN2.0(0.0, 0.0, 1.0, 0.0, 0.0)394.0
5-124.2540.2832.01430.0419.0434.0187.01.941776100.0NEAR OCEAN2.0(0.0, 0.0, 1.0, 0.0, 0.0)419.0
6-124.2340.5452.02694.0453.01152.0435.03.0806106700.0NEAR OCEAN2.0(0.0, 0.0, 1.0, 0.0, 0.0)453.0
7-124.2341.7511.03159.0616.01343.0479.02.480573200.0NEAR OCEAN2.0(0.0, 0.0, 1.0, 0.0, 0.0)616.0
8-124.2241.7328.03003.0699.01530.0653.01.703878300.0NEAR OCEAN2.0(0.0, 0.0, 1.0, 0.0, 0.0)699.0
9-124.2140.7532.01218.0331.0620.0268.01.652858100.0NEAR OCEAN2.0(0.0, 0.0, 1.0, 0.0, 0.0)331.0
\n", "
" ], "text/plain": [ " longitude latitude housing_median_age total_rooms total_bedrooms \\\n", "0 -124.35 40.54 52.0 1820.0 300.0 \n", "1 -124.30 41.80 19.0 2672.0 552.0 \n", "2 -124.30 41.84 17.0 2677.0 531.0 \n", "3 -124.27 40.69 36.0 2349.0 528.0 \n", "4 -124.26 40.58 52.0 2217.0 394.0 \n", "5 -124.25 40.28 32.0 1430.0 419.0 \n", "6 -124.23 40.54 52.0 2694.0 453.0 \n", "7 -124.23 41.75 11.0 3159.0 616.0 \n", "8 -124.22 41.73 28.0 3003.0 699.0 \n", "9 -124.21 40.75 32.0 1218.0 331.0 \n", "\n", " population households median_income median_house_value ocean_proximity \\\n", "0 806.0 270.0 3.0147 94600.0 NEAR OCEAN \n", "1 1298.0 478.0 1.9797 85800.0 NEAR OCEAN \n", "2 1244.0 456.0 3.0313 103600.0 NEAR OCEAN \n", "3 1194.0 465.0 2.5179 79000.0 NEAR OCEAN \n", "4 907.0 369.0 2.3571 111400.0 NEAR OCEAN \n", "5 434.0 187.0 1.9417 76100.0 NEAR OCEAN \n", "6 1152.0 435.0 3.0806 106700.0 NEAR OCEAN \n", "7 1343.0 479.0 2.4805 73200.0 NEAR OCEAN \n", "8 1530.0 653.0 1.7038 78300.0 NEAR OCEAN \n", "9 620.0 268.0 1.6528 58100.0 NEAR OCEAN \n", "\n", " indexed_ocean_proximity oh_ocean_proximity total_bedrooms_filled \n", "0 2.0 (0.0, 0.0, 1.0, 0.0, 0.0) 300.0 \n", "1 2.0 (0.0, 0.0, 1.0, 0.0, 0.0) 552.0 \n", "2 2.0 (0.0, 0.0, 1.0, 0.0, 0.0) 531.0 \n", "3 2.0 (0.0, 0.0, 1.0, 0.0, 0.0) 528.0 \n", "4 2.0 (0.0, 0.0, 1.0, 0.0, 0.0) 394.0 \n", "5 2.0 (0.0, 0.0, 1.0, 0.0, 0.0) 419.0 \n", "6 2.0 (0.0, 0.0, 1.0, 0.0, 0.0) 453.0 \n", "7 2.0 (0.0, 0.0, 1.0, 0.0, 0.0) 616.0 \n", "8 2.0 (0.0, 0.0, 1.0, 0.0, 0.0) 699.0 \n", "9 2.0 (0.0, 0.0, 1.0, 0.0, 0.0) 331.0 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# fit the feature preparation pipeline with trinaing data and show the \n", "feature_preparation_transformer = feature_preparation_pipeline.fit(train)\n", "\n", "# show a sample of data after feature preparation\n", "feature_preparation_transformer.transform(train).limit(10).toPandas()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Further data preparation\n", "\n", "Vector assembler puts all data in a vector column. This step is required by the Spark ML algorithms. \n", "Standard scaler is a data preparation step. StandardScaler follows Standard Normal Distribution (SND). Therefore, it makes mean = 0 and scales the data to unit variance. " ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "assembler = VectorAssembler(inputCols=features, outputCol=\"unscaled_features\")\n", "\n", "std_scaler = StandardScaler(inputCol=\"unscaled_features\", outputCol=\"features\")" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "full_feature_preparation_pipeline = Pipeline(stages=[feature_preparation_pipeline,assembler,std_scaler])" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
unscaled_featuresfeatures
0[-124.35, 40.54, 52.0, 1820.0, 806.0, 270.0, 3...[-62.03326688689804, 18.957877035296093, 4.137...
1[-124.3, 41.8, 19.0, 2672.0, 1298.0, 478.0, 1....[-62.00832387648916, 19.547095709802086, 1.511...
2[-124.3, 41.84, 17.0, 2677.0, 1244.0, 456.0, 3...[-62.00832387648916, 19.56580106454831, 1.3527...
3[-124.27, 40.69, 36.0, 2349.0, 1194.0, 465.0, ...[-61.99335807024382, 19.028022115594425, 2.864...
4[-124.26, 40.58, 52.0, 2217.0, 907.0, 369.0, 2...[-61.988369468162055, 18.976582390042314, 4.13...
5[-124.25, 40.28, 32.0, 1430.0, 434.0, 187.0, 1...[-61.983380866080275, 18.83629222944565, 2.546...
6[-124.23, 40.54, 52.0, 2694.0, 1152.0, 435.0, ...[-61.97340366191672, 18.957877035296093, 4.137...
7[-124.23, 41.75, 11.0, 3159.0, 1343.0, 479.0, ...[-61.97340366191672, 19.52371401636931, 0.8753...
8[-124.22, 41.73, 28.0, 3003.0, 1530.0, 653.0, ...[-61.96841505983494, 19.5143613389962, 2.22808...
9[-124.21, 40.75, 32.0, 1218.0, 620.0, 268.0, 1...[-61.96342645775316, 19.056080147713757, 2.546...
\n", "
" ], "text/plain": [ " unscaled_features \\\n", "0 [-124.35, 40.54, 52.0, 1820.0, 806.0, 270.0, 3... \n", "1 [-124.3, 41.8, 19.0, 2672.0, 1298.0, 478.0, 1.... \n", "2 [-124.3, 41.84, 17.0, 2677.0, 1244.0, 456.0, 3... \n", "3 [-124.27, 40.69, 36.0, 2349.0, 1194.0, 465.0, ... \n", "4 [-124.26, 40.58, 52.0, 2217.0, 907.0, 369.0, 2... \n", "5 [-124.25, 40.28, 32.0, 1430.0, 434.0, 187.0, 1... \n", "6 [-124.23, 40.54, 52.0, 2694.0, 1152.0, 435.0, ... \n", "7 [-124.23, 41.75, 11.0, 3159.0, 1343.0, 479.0, ... \n", "8 [-124.22, 41.73, 28.0, 3003.0, 1530.0, 653.0, ... \n", "9 [-124.21, 40.75, 32.0, 1218.0, 620.0, 268.0, 1... \n", "\n", " features \n", "0 [-62.03326688689804, 18.957877035296093, 4.137... \n", "1 [-62.00832387648916, 19.547095709802086, 1.511... \n", "2 [-62.00832387648916, 19.56580106454831, 1.3527... \n", "3 [-61.99335807024382, 19.028022115594425, 2.864... \n", "4 [-61.988369468162055, 18.976582390042314, 4.13... \n", "5 [-61.983380866080275, 18.83629222944565, 2.546... \n", "6 [-61.97340366191672, 18.957877035296093, 4.137... \n", "7 [-61.97340366191672, 19.52371401636931, 0.8753... \n", "8 [-61.96841505983494, 19.5143613389962, 2.22808... \n", "9 [-61.96342645775316, 19.056080147713757, 2.546... " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# this shows the results of data scaling\n", "full_feature_preparation_transformer = full_feature_preparation_pipeline.fit(train)\n", "\n", "full_feature_preparation_transformer.transform(train).select(\"unscaled_features\",\"features\").limit(10).toPandas()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define the model and assemble a pipeline" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "from pyspark.ml.regression import GBTRegressor\n", "\n", "regressor = GBTRegressor(labelCol=\"median_house_value\", maxIter=40)\n", "\n", "pipeline = Pipeline(stages=[full_feature_preparation_pipeline, regressor])\n", "# this is equivalent to\n", "# pipeline = Pipeline(stages=[ocean_index, ocean_onehot, imputer_tot_br, assembler, std_scaler, regressor])\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fit the model using the training dataset" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# model training \n", "# this uses the pipeline built above\n", "# the pipeline puts together transformers and the model and is an estimator\n", "# we are going to fit it to the training data\n", "\n", "model = pipeline.fit(train)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# the trained model can be saved on the filesystem\n", "\n", "model.save(\"myTrainedModel\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate the model performance on the test dataset by computing RMSE" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Root Mean Squared Error (RMSE) on test data = 54528.9\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-09-29 21:02:23,307 WARN netlib.InstanceBuilder$NativeBLAS: Failed to load implementation from:dev.ludovic.netlib.blas.JNIBLAS\n", "2022-09-29 21:02:23,320 WARN netlib.InstanceBuilder$NativeBLAS: Failed to load implementation from:dev.ludovic.netlib.blas.ForeignLinkerBLAS\n" ] } ], "source": [ "from pyspark.ml.evaluation import RegressionEvaluator\n", "\n", "predictions = model.transform(test)\n", "dt_evaluator = RegressionEvaluator(\n", " labelCol=\"median_house_value\", predictionCol=\"prediction\", metricName=\"rmse\")\n", "rmse = dt_evaluator.evaluate(predictions)\n", "\n", "print(\"Root Mean Squared Error (RMSE) on test data = %g\" % rmse)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Correlation Matrix\n", "The correlation matrix demonstrates the relationship between features. \n", "Correlation ranges from -1 to +1. Values closer to zero means there is no linear trend between the two variables. \n", "It is often displayed using a heatmap." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/cvmfs/sft.cern.ch/lcg/views/LCG_102swan/x86_64-centos7-gcc11-opt/python/pyspark/sql/context.py:125: FutureWarning: Deprecated in 3.0.0. Use SparkSession.builder.getOrCreate() instead.\n", " warnings.warn(\n" ] } ], "source": [ "from pyspark.ml.stat import Correlation\n", "\n", "matrix = Correlation.corr(full_feature_preparation_transformer.transform(train).select('features'), 'features')\n", "matrix_np = matrix.collect()[0][\"pearson({})\".format('features')].values" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAyYAAAI4CAYAAACfnP1tAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAABSDUlEQVR4nO3de7yu9Zz/8dd7705UOjjmWJrGKR2UnEIOIYMYTE4JY3LKDOPUYAhjmIlBU6rNL+UUimiaKEIJqXTYHaZEJSmnpFDRrs/vj+ta9r1X67BXe6/1vVbr9exxP9Z1fa/T5773Wnf35/58v98rVYUkSZIktbSodQCSJEmSZGIiSZIkqTkTE0mSJEnNmZhIkiRJas7ERJIkSVJzJiaSJEmSmjMxkSTdQpKXJDl5FY7/apI9VmdMcy3JvZP8Icni1rFI0kJgYiJJA5XkBUlO7z8cX9l/2N+xdVzjJdknyadH26pql6o6bBaudWiSSvKMce0f7ttfspLnuTTJE6fap6ouq6r1quqmVQhZkrSSTEwkaYCS/DPwYeDfgbsC9wY+Cux6K861xsq0zSM/Av5Sjemfy3OBn6yuC8zz10eS5iUTE0kamCQbAO8GXlNVX6qqP1bVjVX1P1X1pn6ftfsqwRX948NJ1u637ZTk8iRvSfIL4BN9VePIJJ9Oci3wkiQbJPl/fTXm50n+bbJuS0k+kuRnSa5N8sMkj+7bnwK8Fditr+yc3bd/O8nL++VFSd6e5KdJfpXkk/1zJMmmfaVjjySXJflNkrdN8xL9D/CoJBv1608BlgK/GIl38yTfTHJVf87PJNmw3/YpukTvf/qY3zwSx98nuQz45kjbGkk27l/Tp/fnWC/Jj5O8eAb/tJKkKZiYSNLwPAJYBzhqin3eBjwc2AbYGtgBePvI9rsBGwP3Afbs23YFjgQ2BD4DHAYsA/4K2BZ4EvDySa53Wn+tjYHPAkckWaeqvkZX1fl83+1p6wmOfUn/eBxwX2A9YP9x++wI3A94AvCOJA+Y4rnfABwNPK9ffzHwyXH7BHgfcHfgAcC9gH0Aqmp34DLg6X3M/zly3GP7/Z88erKq+i3wMuBjSe4CfAg4q6rGX1eSdCuZmEjS8NwR+E1VLZtinxcC766qX1XVr4F3AbuPbL8ZeGdV/amqru/bvl9VX66qm4E7ALsAr+srMr+i+7D9PCZQVZ+uqquqallVfRBYmy6RWBkvBP6rqi6uqj8A/wI8b1x3qXdV1fVVdTZwNl2yNZVPAi/uKy+PBb48Lt4fV9XX++f/a+C/+v2ms0//elw/fkNVHQ8cAZwA/A3wipU4nyRpJdmHVpKG5yrgTknWmCI5uTvw05H1n/ZtY35dVTeMO+ZnI8v3AdYErkwy1rZo3D5/keQNdNWUuwNFl9jcafqnMmmsa9CNnRnzi5Hl6+iqKpOqqpOT3JmuSnRMVV0/8jzoqxr7AY8G1qd7blevRKwTPv8RS4C9gH+vqqtW4nySpJVkxUSShuf7dN2VnjnFPlfQJRdj7t23jakJjhlt+xnwJ+BOVbVh/7hDVT1o/EH9eJK3AH8HbFRVGwLX0HWXmuxa08W6DPjlNMdN59PAG7hlNy7ounEVsFVV3QF4EcvjhcljnvS59ONvDu6v96okf3VrgpYkTczERJIGpqquAd4BHJDkmUlun2TNJLskGRsPcTjw9iR3TnKnfv9PT3bOCa5xJXA88MEkd+gHqG+eZKLuTuvTJRK/BtZI8g66ismYXwKbJpns/ymHA69PslmS9Vg+JmWqrmorYz9gZ+CkSWL+A/C7JPcA3jRu+y/pxrvMxFv7ny8DPgB80nucSNLqY2IiSQNUVf8F/DNdV6Vf01U49mL5WIp/A06nm43qHOCMvm0mXgysBZxP183pSGCTCfY7Dvgq3TS9P6Wr5ox2eTqi/3lVkjMmOP4Q4FN0CcQl/fGvnWGst1BVv62qE6pqoirHu4CH0FV2/hf40rjt76NL7H6X5I3TXSvJdnT/Hi/u72vyH3TVlb1X5TlIkpbLxO/nkiRJkjR3rJhIkiRJas7ERJIkSVqAkhzS3/j23Em2J8l+/Q1llyZ5yMi2pyS5sN+2Wrq1mphIkiRJC9OhwFOm2L4LsEX/2BM4EP4yS+EB/fYHAs9P8sBVDcbERJIkSVqAquok4LdT7LIr8MnqnAJsmGQTYAfgx/2Nc/8MfK7fd5V4g8UF4Hbb7uUMB9O4+rT9W4cweAd89+LWIQzeT64afz9DjXf9n29qHcKgrbOmsw9PZ/11fI2m87vrVnUm7tu+Jc99UKbfa+7M1me1G8464BV0lY4xS6pqyQxOcQ9WnIXx8r5tovaH3do4x5iYSJIkSbdBfRIyk0RkvIkSuJqifZWYmEiSJEktTXp/2uYuB+41sn5P4Aq6e2BN1L5KBvsqSJIkSWrqaODF/excDweuqaorgdOALZJslmQt4Hn9vqvEiokkSZLUUtoMeUlyOLATcKcklwPvBNYEqKqDgGOBpwI/Bq4DXtpvW5ZkL+A4YDFwSFWdt6rxmJhIkiRJLTXqylVVz59mewGvmWTbsXSJy2pjVy5JkiRJzVkxkSRJklpq1JVraKyYSJIkSWrOiokkSZLU0nCnC55TJiaSJElSS3blAuzKJUmSJGkArJhIkiRJLdmVC7BiIkmSJGkArJhIkiRJLTnGBDAxkSRJktqyKxdgVy5JkiRJA2DFRJIkSWrJrlyAFRNJkiRJA2DFRJIkSWrJMSaAiYkkSZLUll25ALtySZIkSRoAKyaSJElSS3blAqyY3EKSP8zCOZ+RZO9++ZlJHngrzvHtJNuv7tgkSZKkIbBiMgeq6mjg6H71mcAxwPnNApIkSdJwWDEBrJhMKp19k5yb5Jwku/XtO/XViyOTXJDkM0k3YinJU/u2k5Psl+SYvv0lSfZP8kjgGcC+Sc5KsvloJSTJnZJc2i/fLsnnkixN8nngdiOxPSnJ95OckeSIJOvN7asjSZKk1WZRZucxz1gxmdzfAtsAWwN3Ak5LclK/bVvgQcAVwHeBRyU5HTgYeExVXZLk8PEnrKrvJTkaOKaqjgTI5LMwvAq4rqq2SrIVcEa//52AtwNPrKo/JnkL8M/Au1fDc5YkSZKasGIyuR2Bw6vqpqr6JXAi8NB+26lVdXlV3QycBWwK3B+4uKou6fe5RWIyQ48BPg1QVUuBpX37w4EHAt9NchawB3Cf8Qcn2TPJ6UlOX/ab81YxFEmSJM2aLJqdxzxjxWRyU9W//jSyfBPd63hr62XLWJ4grjNuW00S19er6vlTnbSqlgBLAG637V4TnUeSJEkajPmXSs2dk4DdkixOcme6CsapU+x/AXDfJJv267tNst/vgfVH1i8FtuuXnzPu+i8ESLIlsFXffgpd17G/6rfdPslfr8wTkiRJ0gAls/OYZ0xMJncUXfeps4FvAm+uql9MtnNVXQ+8GvhakpOBXwLXTLDr54A3JTkzyebAB4BXJfke3ViWMQcC6yVZCryZPimqql8DLwEO77edQteNTJIkSfORXbkAu3LdQlWt1/8s4E39Y3T7t4Fvj6zvNbL5W1V1/36WrgOA0/t9DgUO7Ze/SzdGZNRWI8tv7/e7HnjeJDF+k+XjXSRJkqR5b/6lUsP2D/2A9POADehm6ZIkSZImZ1cuwIrJalVVHwI+1DoOSZIkab4xMZEkSZJamofjQWaDiYkkSZLU0jzsdjUbTM8kSZIkNWfFRJIkSWrJrlyAFRNJkiRJA2DFRJIkSWrJMSaAiYkkSZLUll25ALtySZIkSRoAKyaSJElSS3blAqyYSJIkSRoAKyaSJElSS44xAUxMJEmSpLZMTAC7ckmSJEkaACsmkiRJUksOfgdMTCRJkqS27MoF2JVLkiRJ0gBYMZEkSZJasisXYMVEkiRJ0gBYMZEkSZJaajTGJMlTgI8Ai4GPV9X7x21/E/DCfnUN4AHAnavqt0kuBX4P3AQsq6rtVzUeE5MF4OrT9m8dwuBt9NC9WocweK96l6/RdO54+zVbhzB4d9h4ndYhDNoNy25uHcLgLbbLy7TWW2tx6xA0Uw1+r5MsBg4AdgYuB05LcnRVnT+2T1XtC+zb7/904PVV9duR0zyuqn6zumKyK5ckSZK08OwA/LiqLq6qPwOfA3adYv/nA4fPZkAmJpIkSVJDSWblMY17AD8bWb+8b5sovtsDTwG+ONJcwPFJfphkz1V4+n9hVy5JkiTpNqhPGEaThiVVtWRs8wSH1CSnejrw3XHduB5VVVckuQvw9SQXVNVJqxKviYkkSZLU0EpUN26VPglZMsnmy4F7jazfE7hikn2fx7huXFV1Rf/zV0mOousatkqJiV25JEmSpJYyS4+pnQZskWSzJGvRJR9H3yK0ZAPgscBXRtrWTbL+2DLwJODcGT/vcayYSJIkSQtMVS1LshdwHN10wYdU1XlJXtlvP6jf9VnA8VX1x5HD7woc1Vd61gA+W1VfW9WYTEwkSZKkhmarK9d0qupY4NhxbQeNWz8UOHRc28XA1qs7HrtySZIkSWrOiokkSZLUUKuKydCYmEiSJEkNmZh07MolSZIkqTkrJpIkSVJDVkw6VkwkSZIkNWfFRJIkSWrJgglgYiJJkiQ1ZVeujl25JEmSJDVnxUSSJElqyIpJx4qJJEmSpOasmEiSJEkNWTHpmJhIkiRJDZmYdOzKJUmSJKk5KyaSJElSSxZMACsmt0qSP0yzfcMkrx5Zv3uSI/vlbZI89VZcc58kb5x5tJIkSdLwmZjMjg2BvyQmVXVFVT2nX90GmHFiIkmSpNumJLPymG9MTFZBkvWSnJDkjCTnJNm13/R+YPMkZyXZN8mmSc5NshbwbmC3fttu4ysh/X6b9stvS3Jhkm8A9xvZZ/MkX0vywyTfSXL/uXvWkiRJWp1MTDqOMVk1NwDPqqprk9wJOCXJ0cDewJZVtQ3AWKJRVX9O8g5g+6raq9+2z0QnTrId8DxgW7p/pzOAH/ablwCvrKqLkjwM+Cjw+Fl5hpIkSdIcsGKyagL8e5KlwDeAewB3XU3nfjRwVFVdV1XXAkdDV6UBHgkckeQs4GBgk1sEluyZ5PQkp/+/jy1ZTSFJkiRpdbNi0rFismpeCNwZ2K6qbkxyKbDODM+xjBUTxNHja4L9FwG/G6vGTKaqltBVVrhh2YTnkSRJkgbDismq2QD4VZ+UPA64T9/+e2D9SY4Zv+1S4CEASR4CbNa3nwQ8K8ntkqwPPB2gr55ckuS5/TFJsvXqe0qSJEmaU5mlxzxjYrJqPgNsn+R0uurJBQBVdRXw3X4g+77jjvkW8MCxwe/AF4GN+25ZrwJ+1J/jDODzwFn9Pt8ZOccLgb9PcjZwHrArkiRJmpfsytWxK9etUFXr9T9/Azxikn1eMK5py779t8BDx2170iTneC/w3gnaLwGeMrOoJUmSpOEyMZEkSZIamo/VjdlgVy5JkiRJzVkxkSRJkhqyYtIxMZEkSZIaMjHp2JVLkiRJUnNWTCRJkqSWLJgAVkwkSZIkDYAVE0mSJKkhx5h0TEwkSZKkhkxMOnblkiRJktScFRNJkiSpISsmHSsmkiRJkpqzYiJJkiS1ZMEEMDGRJEmSmrIrV8euXJIkSZKas2IiSZIkNWTFpGPFRJIkSVJzVkwkSZKkhqyYdExMJEmSpIZMTDp25ZIkSZLUnBUTSZIkqSULJoCJiSRJktSUXbk6JiYLwAHfvbh1CIP3qnft1TqEwTvwnfu3DmHwvvLZd7YOYfCuu/Gm1iFonvvJ1de1DmHwHrzxeq1D0DyR5CnAR4DFwMer6v3jtu8EfAW4pG/6UlW9e2WOvTVMTCRJkqSGWlRMkiwGDgB2Bi4HTktydFWdP27X71TV027lsTPi4HdJkiRp4dkB+HFVXVxVfwY+B+w6B8dOysREkiRJaiiZncc07gH8bGT98r5tvEckOTvJV5M8aIbHzohduSRJkqSGZqsrV5I9gT1HmpZU1ZKxzRMcUuPWzwDuU1V/SPJU4MvAFit57IyZmEiSJEm3QX0SsmSSzZcD9xpZvydwxbjjrx1ZPjbJR5PcaWWOvTXsyiVJkiQ11Kgr12nAFkk2S7IW8Dzg6BXjyt3Sl3OS7ECXO1y1MsfeGlZMJEmSpAWmqpYl2Qs4jm7K30Oq6rwkr+y3HwQ8B3hVkmXA9cDzqqqACY9d1ZhMTCRJkqSGWt1gsaqOBY4d13bQyPL+wIQ3Mpvo2FVlYiJJkiQ15I3fO44xkSRJktScFRNJkiSpoUWLLJmAFRNJkiRJA2DFRJIkSWrIMSYdExNJkiSpoVazcg2NXbkkSZIkNWfFRJIkSWrIgknHiokkSZKk5qyYSJIkSQ05xqRjYiJJkiQ1ZGLSsSuXJEmSpOasmEiSJEkNWTDprFTFJMmmSc6drSCSfG+2zr2qRp97ku2T7Nc6JkmSJOm2ZhAVk6p6ZOsYVkZVnQ6c3joOSZIk3XY4xqQzkzEmi5N8LMl5SY5Pcrsk2yQ5JcnSJEcl2QggybeTbN8v3ynJpf3yg5KcmuSs/pgt+vY/9D936o89MskFST6T/l8qyVP7tpOT7JfkmMkCTbJPksP6OC9N8rdJ/jPJOUm+lmTNfr/tkpyY5IdJjkuyyUj72Um+D7xm5Lw7jV03yQ5JvpfkzP7n/fr2lyT5Un+di5L851QvapIDk5zev67vGmmf8PkmWTfJIUlO66+96wz+DSVJkjQwyew85puZJCZbAAdU1YOA3wHPBj4JvKWqtgLOAd45zTleCXykqrYBtgcun2CfbYHXAQ8E7gs8Ksk6wMHALlW1I3DnlYh3c+BvgF2BTwPfqqoHA9cDf9MnJ/8NPKeqtgMOAd7bH/sJ4B+r6hFTnP8C4DFVtS3wDuDfR7ZtA+wGPBjYLcm9pjjP26pqe2Ar4LFJtprm+b4N+GZVPRR4HLBvknWneS0kSZKkQZtJYnJJVZ3VL/+Q7oP/hlV1Yt92GPCYac7xfeCtSd4C3Keqrp9gn1Or6vKquhk4C9gUuD9wcVVd0u9z+ErE+9WqupEuYVoMfK1vP6c/5/2ALYGvJzkLeDtwzyQbjHten5rk/BsAR/TjTz4EPGhk2wlVdU1V3QCcD9xnijj/LskZwJn9OR44zfN9ErB3H/O3gXWAe48/aZI9+0rM6af8z8q8XJIkSWohyaw85puZjDH508jyTcCGU+y7jOVJzzpjjVX12SQ/oKtkHJfk5VX1zWmuswZwa17ZP/XXvDnJjVVVffvNI+c8b3xVJMmGQDG999BVYZ6VZFO6JGGq53ALSTYD3gg8tKquTnIo3es11fMN8OyqunCq4KpqCbAE4IMnXrwyz0eSJElqZlXuY3INcHWSR/fruwNjVYZLge365eeMHZDkvnSVgP2Ao+m6L62MC4D79gkAdN2kVtWFwJ2TPKKPbc0kD6qq3wHXJNmx3++Fkxy/AfDzfvkltzKGOwB/7K93V2CXvn2q53sc8NqRsTfb3sprS5IkaQAcY9JZ1Vm59gAOSnJ74GLgpX37B4AvJNkdGK2I7Aa8KMmNwC+Ad6/MRarq+iSvBr6W5DfAqasYN1X15yTPAfbru2+tAXwYOK9/HockuY4uEZjIfwKHJflnVnyOM4nh7CRn9te8GPhu3z7V831PH+fSPjm5FHjarbm+JEmS2puP3a5mQ5b3cBq2JOtV1R/6D+MHABdV1YdaxzVbVufztSvX9H5+zZ9bhzB4B75z/9YhDN5XPjvd/B+67sabWoegee4nV1/XOoTB++uNnRNnOk9/8F0HlQk89L3fnpXPaqe9badBPc/prEpXrrn2D/2A7/PoulEd3DacWbfQnq8kSdKCZFeuziBusLgy+mrBChWDJC8F/mncrt+tqtcwMP2g/7XHNe9eVedMtP9Ez1eSJEm6rZo3iclEquoTdPccGbyqeljrGCRJkjQ8jjHpzOvERJIkSZrvzEs682mMiSRJkqTbKCsmkiRJUkN25epYMZEkSZLUnBUTSZIkqSELJh0TE0mSJKkhu3J17MolSZIkqTkrJpIkSVJDFkw6VkwkSZIkNWfFRJIkSWrIMSYdExNJkiSpIROTjl25JEmSJDVnxUSSJElqyIJJx4qJJEmSpOasmEiSJEkNOcakY2IiSZIkNWRe0rErlyRJkqTmrJhIkiRJDdmVq2PFRJIkSVJzVkwWgJ9cdUPrEAbvjrdfs3UIg/eVz76zdQiDt+sL3tU6hOG74z1bRzBs1/++dQSD9+SXPqt1CIP39kM+3jqEwbv+e//eOoQVWDDpmJhIkiRJDS0yMwHsyiVJkiRpAExMJEmSpIaS2XlMf908JcmFSX6cZO8Jtr8wydL+8b0kW49suzTJOUnOSnL66ngd7MolSZIkLTBJFgMHADsDlwOnJTm6qs4f2e0S4LFVdXWSXYAlwMNGtj+uqn6zumIyMZEkSZIaajRd8A7Aj6vq4j6GzwG7An9JTKrqeyP7nwLM6gwmduWSJEmSGlqU2Xkk2TPJ6SOPPUcuew/gZyPrl/dtk/l74Ksj6wUcn+SH4857q1kxkSRJkm6DqmoJXferiUxUpqkJd0weR5eY7DjS/KiquiLJXYCvJ7mgqk5alXhNTCRJkqSGGnXluhy418j6PYErxu+UZCvg48AuVXXVWHtVXdH//FWSo+i6hq1SYmJXLkmSJKmhRrNynQZskWSzJGsBzwOOXjGu3Bv4ErB7Vf1opH3dJOuPLQNPAs5d1dfBiokkSZK0wFTVsiR7AccBi4FDquq8JK/stx8EvAO4I/DRvqqzrKq2B+4KHNW3rQF8tqq+tqoxmZhIkiRJDWXC4R6zr6qOBY4d13bQyPLLgZdPcNzFwNbj21eVXbkkSZIkNWfFRJIkSWpoUZuCyeCYmEiSJEkNNZqVa3DsyiVJkiSpOSsmkiRJUkMWTDpWTCRJkiQ1Z8VEkiRJamiRJRPAxESSJElqyrykY1cuSZIkSc1ZMZEkSZIacrrgzm2+YpJkwySvnmafTZO8YCXOtWmSc1dfdJIkSZJgASQmwIbAlIkJsCkwbWIyE0msRkmSJGlayew85puF8OH5/cDmSc4Cvt637QIU8G9V9fl+nwf0+xwGHAV8Cli333+vqvredBdK8hLgb4B1gHWTPAc4BLgvcB2wZ1UtTbLxJO37AJsBmwB/Dfwz8PA+3p8DT6+qG5O8H3gGsAw4vqreeOteGkmSJLXmrFydhZCY7A1sWVXbJHk28Epga+BOwGlJTur3eWNVPQ0gye2BnavqhiRbAIcD26/k9R4BbFVVv03y38CZVfXMJI8HPglsA7xrknaAzYHHAQ8Evg88u6renOQo4G/6eJ8F3L+qKsmGt/6lkSRJkoZhIXTlGrUjcHhV3VRVvwROBB46wX5rAh9Lcg5wBF2SsLK+XlW/HbnepwCq6pvAHZNsMEU7wFer6kbgHGAx8LW+/Ry6LmfXAjcAH0/yt3QVl1tIsmeS05Ocfv7xX5hB+JIkSZpLmaXHfLPQEpOV/Td6PfBLusrK9sBaM7jGH6e5Xk3RDvAngKq6GbixqsbabwbWqKplwA7AF4FnsjxxWfFkVUuqavuq2v6BT/q7GYQvSZIkzb2FkJj8Hli/Xz4J2C3J4iR3Bh4DnDpuH4ANgCv75GB3usrFrXES8EKAJDsBv6mqa6don1aS9YANqupY4HUs7wImSZKkeSjJrDzmm9v8GJOquirJd/tpfr8KLAXOpqtQvLmqfpHkKmBZkrOBQ4GPAl9M8lzgW6xYBZmJfYBPJFlK1+Vqj2naV8b6wFeSrENXeXn9rYxNkiRJA7Bo/uUQs+I2n5gAVNX4qYDfNG77jcATxu2z1cjyv/T7XQpsOcV1DqVLbMbWfwvsOsF+k7XvM259vUm27TBZDJIkSdJ8tCASE0mSJGmo5mO3q9lgYnIrJHky8B/jmi+pqme1iEeSJEma70xMboWqOg44rnUckiRJmv8smHRMTCRJkqSG7MrVWQjTBUuSJEkaOCsmkiRJUkNOF9yxYiJJkiSpOSsmkiRJUkOOMemYmEiSJEkNmZZ07MolSZIkqTkrJpIkSVJDi+zKBVgxkSRJkjQAVkwkSZKkhiyYdExMJEmSpIaclatjVy5JkiRJzVkxkSRJkhqyYNKxYiJJkiSpOSsmkiRJUkNOF9wxMZEkSZIaMi/p2JVLkiRJUnNWTCRJkqSGnC64Y8VEkiRJUnNWTBaA6/98U+sQBu8OG6/TOoTBu+5Gf4+mdcd7to5g+K66vHUEw7buRq0jGLzN77p+6xCGb511W0egGbJS0DExkSRJkhqyK1fHBE2SJElSc1ZMJEmSpIYWWTABrJhIkiRJGgATE0mSJKmhRZmdx3SSPCXJhUl+nGTvCbYnyX799qVJHrKyx94aduWSJEmSGmox+D3JYuAAYGfgcuC0JEdX1fkju+0CbNE/HgYcCDxsJY+dMSsmkiRJ0sKzA/Djqrq4qv4MfA7Yddw+uwKfrM4pwIZJNlnJY2fMxESSJElqaLa6ciXZM8npI489Ry57D+BnI+uX922sxD4rc+yM2ZVLkiRJug2qqiXAkkk2T9R/rFZyn5U5dsZMTCRJkqSGGt1f8XLgXiPr9wSuWMl91lqJY2fMrlySJElSQ4uSWXlM4zRgiySbJVkLeB5w9Lh9jgZe3M/O9XDgmqq6ciWPnTErJpIkSdICU1XLkuwFHAcsBg6pqvOSvLLffhBwLPBU4MfAdcBLpzp2VWMyMZEkSZIaatWFqaqOpUs+RtsOGlku4DUre+yqsiuXJEmSpOasmEiSJEkNNRr8PjgmJpIkSVJDKzFQfUGwK5ckSZKk5qyYSJIkSQ1ZMOmYmEiSJEkNLTIxAezKJUmSJGkArJhIkiRJDTn4vWPFZJYk2TTJuSuxzwtG1rdPst/sRydJkiQNixWTtjYFXgB8FqCqTgdObxmQJEmS5pYFk86CrZj01YoLkhyWZGmSI5PcPskTkpyZ5JwkhyRZu9//0iT/keTU/vFXffuhSZ4zct4/THKt7yQ5o388st/0fuDRSc5K8vokOyU5pj9m4yRf7mM7JclWffs+fVzfTnJxkn+c7ddKkiRJs2dRZucx3yzYxKR3P2BJVW0FXAv8M3AosFtVPZiuovSqkf2vraodgP2BD8/gOr8Cdq6qhwC7AWPdtfYGvlNV21TVh8Yd8y7gzD62twKfHNl2f+DJwA7AO5OsOYNYJEmSpMFZ6InJz6rqu/3yp4EnAJdU1Y/6tsOAx4zsf/jIz0fM4DprAh9Lcg5wBPDAlThmR+BTAFX1TeCOSTbot/1vVf2pqn5Dl/TcdfzBSfZMcnqS0y884cgZhCpJkqS5lFn6b75Z6GNMahX2H1teRp/gJQmw1gTHvR74JbB1v+8NK3GtiX6bxq75p5G2m5jg37GqlgBLAF76uXNm+jwlSZKkObXQKyb3TjJW+Xg+8A1g07HxI8DuwIkj++828vP7/fKlwHb98q501ZHxNgCurKqb+3Mu7tt/D6w/SWwnAS8ESLIT8JuqunZlnpQkSZLmD8eYdBZ6xeT/gD2SHAxcBPwTcApwRJI1gNOAg0b2XzvJD+gSuuf3bR8DvpLkVOAE4I8TXOejwBeTPBf41sg+S4FlSc6mG9ty5sgx+wCfSLIUuA7YY9WeqiRJkoZoPiYRs2GhJyY3V9Urx7WdAGw7yf4HVNW7Rhuq6pfAw0ea/qVvvxTYsl++CNhqgn1upBvXMurb/bbf0lVgVlBV+4xb33KSWCVJkqR5Y6EnJpIkSVJT8UYmwAJOTEYrGiu5/6azFowkSZK0wC3YxESSJEkaAseYdExMJEmSpIbsydVZ6NMFS5IkSRoAKyaSJElSQ4ssmQBWTCRJkiQNgBUTSZIkqSEHv3dMTCRJkqSG7MnVsSuXJEmSpOasmEiSJEkNLcKSCVgxkSRJkjQAVkwkSZKkhhxj0jExkSRJkhpyVq6OXbkkSZIkNWfFRJIkSWrIO793rJhIkiRJas6KiSRJktSQBZOOiYkkSZLUkF25OnblkiRJktScFRNJkiSpIQsmHSsmkiRJkpqzYrIArLPm4tYhDN4Ny25uHYJuC67/fesIhm/djVpHMGx/vLp1BINXVa1DGL5rf9M6As2QlYKOiYkkSZLUUOzLBZigSZIkSRoAKyaSJElSQ9ZLOlZMJEmSJDVnxUSSJElqyBssdkxMJEmSpIZMSzp25ZIkSZK0giQbJ/l6kov6n7eY7z3JvZJ8K8n/JTkvyT+NbNsnyc+TnNU/njrdNU1MJEmSpIaS2Xmsor2BE6pqC+CEfn28ZcAbquoBwMOB1yR54Mj2D1XVNv3j2OkuaGIiSZIkabxdgcP65cOAZ47foaqurKoz+uXfA/8H3OPWXtDERJIkSWooyWw99kxy+shjzxmEddequhK6BAS4yzTPYVNgW+AHI817JVma5JCJuoKN5+B3SZIkqaHZqhRU1RJgyWTbk3wDuNsEm942k+skWQ/4IvC6qrq2bz4QeA9Q/c8PAi+b6jwmJpIkSdICVFVPnGxbkl8m2aSqrkyyCfCrSfZbky4p+UxVfWnk3L8c2edjwDHTxWNXLkmSJKmh2erKtYqOBvbol/cAvjJB3AH+H/B/VfVf47ZtMrL6LODc6S5oYiJJkiRpvPcDOye5CNi5XyfJ3ZOMzbD1KGB34PETTAv8n0nOSbIUeBzw+ukuaFcuSZIkqaEh3mCxqq4CnjBB+xXAU/vlk5kk/KrafabXNDGRJEmSGloN3a5uE+zKJUmSJKk5KyaSJElSQ1YKOr4OkiRJkpqzYiJJkiQ15BiTjomJJEmS1JBpSWfBdOVKsmmSaW/sMgvX/cMM998nyRsnaG8SvyRJkjQXrJhIkiRJDdmTq7NgKia9xUk+luS8JMcnuV2SbZKckmRpkqOSbASQ5NtJtu+X75Tk0n75QUlO7e9suTTJFn37i0baD06yeOyiSd6b5Oz+Onft2+6T5IT+HCckuff4YJNs1x/3feA1I+0TxiBJkiTNVwstMdkCOKCqHgT8Dng28EngLVW1FXAO8M5pzvFK4CNVtQ2wPXB5kgcAuwGP6ttvAl7Y778ucEpVbQ2cBPxD374/8Mn+up8B9pvgWp8A/rGqHjFdDNM+c0mSJA3SIjIrj/lmoSUml1TVWf3yD4HNgQ2r6sS+7TDgMdOc4/vAW5O8BbhPVV0PPAHYDjgtyVn9+n37/f8MHDNyzU375UcAn+2XPwXsOHqRJBuMi+1T08TAuOP3THJ6ktPP//oXpnlKkiRJaiWZncd8s9ASkz+NLN8EbDjFvstY/vqsM9ZYVZ8FngFcDxyX5PF0kykcVlXb9I/7VdU+/SE3VlWNXHOycT01bj0TtE0Vw/h9llTV9lW1/QN3/rspnqYkSZLU3kJLTMa7Brg6yaP79d2BsQrFpXRVEIDnjB2Q5L7AxVW1H3A0sBVwAvCcJHfp99k4yX2mufb3gOf1yy8ETh7dWFW/A65JsuPIPlPFIEmSpHkos/TffOOsXLAHcFCS2wMXAy/t2z8AfCHJ7sA3R/bfDXhRkhuBXwDvrqrfJnk7cHySRcCNdIPVfzrFdf8ROCTJm4Bfj1x31Ev7fa4Djpsqhhk9Y0mSJA3GfOx2NRsWTGJSVZcCW46sf2Bk88Mn2P8CVqxEvL1vfx/wvgn2/zzw+Qna1xtZPhI4ciSeibpg7TOy/ENg65HN+0wVgyRJkjRfLZjERJIkSRqi+TiD1mxY6GNMJEmSJA2AFRNJkiSpIceYdExMJEmSpIZMTDp25ZIkSZLUnBUTSZIkqaH5eM+R2WDFRJIkSVJzVkwkSZKkhhZZMAFMTCRJkqSm7MrVsSuXJEmSpOasmEiSJEkNOV1wx4qJJEmSpOasmEiSJEkNOcakY2IiSZIkNeSsXB27ckmSJElqzoqJJEmS1JBduTpWTCRJkiQ1Z8VEkiRJasjpgjsmJpIkSVJD5iUdu3JJkiRJas6KiSRJktTQIvtyAVZMJEmSJA2AFZMFYP11FrcOYfAW+03FtH5y9XWtQxi8J7/0Wa1DGLzN77p+6xAGrapahzB4B75z/9YhDN5e73lt6xA0Q34K6ZiYSJIkSS2ZmQB25ZIkSZI0AFZMJEmSpIa883vHiokkSZKk5qyYSJIkSQ05B0/HxESSJElqyLykY1cuSZIkSc1ZMZEkSZJasmQCWDGRJEmSNE6SjZN8PclF/c+NJtnv0iTnJDkryekzPX6UiYkkSZLUUGbpv1W0N3BCVW0BnNCvT+ZxVbVNVW1/K48HTEwkSZKkppLZeayiXYHD+uXDgGfO9vEmJpIkSdJtUJI9k5w+8thzBofftaquBOh/3mWS/Qo4PskPx51/ZY//Cwe/S5IkSQ3N1tj3qloCLJn0usk3gLtNsOltM7jMo6rqiiR3Ab6e5IKqOmmGoQImJpIkSdKCVFVPnGxbkl8m2aSqrkyyCfCrSc5xRf/zV0mOAnYATgJW6vhRduWSJEmSWsosPVbN0cAe/fIewFduEXaybpL1x5aBJwHnruzx45mYSJIkSQ0NdFau9wM7J7kI2LlfJ8ndkxzb73NX4OQkZwOnAv9bVV+b6vip2JVLkiRJ0gqq6irgCRO0XwE8tV++GNh6JsdPxcREkiRJamg1TO17m2BXLkmSJEnNWTGRJEmSGrJg0jExkSRJkloyMwFu4125knw7yfb98rFJNlyN535lkhevrvNJkiRJC9mCqZhU1VNX8/kOWp3nkyRJ0sK0Gqb2vU0YXMUkyaZJLkjy8STnJvlMkicm+W6Si5Ls0N/M5ZAkpyU5M8mu/bG3S/K5JEuTfB643ch5L01yp375y0l+mOS8JHuO7POHJO9NcnaSU5LcdYo490nyxn7520n+I8mpSX6U5NF9++IkH0hyTh/Ta/v2J/Rxn9M/j7VHYvz3JN9PcnqShyQ5LslPkrxy5Npv6p/70iTvWq3/AJIkSVIDg0tMen8FfATYCrg/8AJgR+CNwFuBtwHfrKqHAo8D9u3vNvkq4Lqq2gp4L7DdJOd/WVVtB2wP/GOSO/bt6wKnVNXWwEnAP8wg5jWqagfgdcA7+7Y9gc2AbfuYPpNkHeBQYLeqejBd1epVI+f5WVU9AvhOv99zgIcD7wZI8iRgC2AHYBtguySPmUGckiRJGpBkdh7zzVATk0uq6pyquhk4Dzihqgo4B9iU7nb3eyc5C/g2sA5wb+AxwKcBqmopsHSS8/9jf4fKU4B70X3QB/gzcEy//MP+WivrSxMc90TgoKpa1sf0W+B+/fP7Ub/PYX3cY47uf54D/KCqfl9VvwZu6MfIPKl/nAmcQZe4bcE4Sfbsqy6nn/21z8/gaUiSJGkuZZYe881Qx5j8aWT55pH1m+livgl4dlVdOHpQutSwpjpxkp3oEoZHVNV1Sb5Nl9gA3NgnQPTXmMnrMxbj6HGZIJ7pfk9Gn+v412GN/vj3VdXBU52kqpYASwDe/L8XTvmaSJIkSa0NtWIyneOA16bPRJJs27efBLywb9uSrivYeBsAV/dJyf3puknNluOBVyZZo49pY+ACYNMkf9Xvsztw4gzOeRzwsiTr9ee8R5K7rMaYJUmSNJcsmQDzNzF5D7AmsDTJuf06wIHAekmWAm8GTp3g2K8Ba/T7vIeuO9ds+ThwWR/n2cALquoG4KXAEUnOoauErPQMX1V1PPBZ4Pv98UcC66/2yCVJkqQ5lOU9l3RbZVeu6W24zlB7NQ7H2mvMw69e5th3Lvpt6xAGb/O7+j3KVPx/8vQOfOf+rUMYvL3e89rWIQzevk+736D+p3bez/84K3/8D7rHuoN6ntPx05gkSZLU0HycQWs2mJhMI8nbgOeOaz6iqt7bIh5JkiTptsjEZBp9AmISIkmSpFlhwaQzXwe/S5IkSboNsWIiSZIktWTJBDAxkSRJkpqKmQlgVy5JkiRJA2DFRJIkSWrI6YI7JiaSJElSQ+YlHbtySZIkSWrOiokkSZLUkiUTwIqJJEmSpAGwYiJJkiQ15HTBHRMTSZIkqSFn5erYlUuSJElSc1ZMJEmSpIYsmHSsmEiSJElqzoqJJEmS1JIlE8DERJIkSWrKWbk6duWSJEmS1JwVE0mSJKkhpwvuWDGRJEmS1JwVkwXgd9ctax3C4K231uLWIQzegzder3UIg/f2Qz7eOoThW2fd1hEM27W/aR3B4O31nte2DmHw9v/X/24dwuDt+7T9W4ewAgsmHRMTSZIkqSUzE8CuXJIkSZIGwIqJJEmS1JDTBXesmEiSJElqzoqJJEmS1JDTBXdMTCRJkqSGzEs6duWSJEmS1JwVE0mSJKkhu3J1rJhIkiRJas6KiSRJktSUJROwYiJJkiQ1lczOY9ViysZJvp7kov7nRhPsc78kZ408rk3yun7bPkl+PrLtqdNd08REkiRJ0nh7AydU1RbACf36Cqrqwqrapqq2AbYDrgOOGtnlQ2Pbq+rY6S5oYiJJkiQ1lFl6rKJdgcP65cOAZ06z/xOAn1TVT2/tBU1MJEmSpNugJHsmOX3ksecMDr9rVV0J0P+8yzT7Pw84fFzbXkmWJjlkoq5g4zn4XZIkSWpotqYLrqolwJLJr5tvAHebYNPbZnKdJGsBzwD+ZaT5QOA9QPU/Pwi8bKrzmJhIkiRJDaXRrFxV9cTJtiX5ZZJNqurKJJsAv5riVLsAZ1TVL0fO/ZflJB8DjpkuHrtySZIkSRrvaGCPfnkP4CtT7Pt8xnXj6pOZMc8Czp3ugiYmkiRJUkvDHP3+fmDnJBcBO/frJLl7kr/MsJXk9v32L407/j+TnJNkKfA44PXTXdCuXJIkSZJWUFVX0c20Nb79CuCpI+vXAXecYL/dZ3pNExNJkiSpIe/73jExkSRJkhqarVm55hvHmEiSJElqbsrEJMmGSV49zT6bJnnBdBfq95t0NH6SlyTZf7rzzNbxsynJo5Ocl+SsJPdIcmTfvlOSY/rlGcef5NIkd5qNmCVJkjQ3Mkv/zTfTVUw2BKZMTIBNgWkTk1aSDKG72guBD1TVNlX186p6TuuAJEmSpCGZLjF5P7B5/03/vv3j3H7qr91G9nl0v8/r+8rId5Kc0T8eOYN47pXka0kuTPLOscYkL0pyan+Ng5Ms7ttfmuRHSU4EHjWy/6FJ/ivJt4D/SLJNklOSLE1yVJKN+v0ma/92kg8lOSnJ/yV5aJIvJbkoyb/1+6yb5H+TnN2/JrsxgSQvB/4OeEeSz0xXOeqPuXOSLyY5rX88qm+/Y5Ljk5yZ5GAcKyVJkjT/DXO64Dk3XWKyN/CTqtoGOAXYBtgaeCKwb3/jlL2B7/TVgA/R3RVy56p6CLAbsN8M4tmBrrqwDfDcJNsneUB/nkf1cdwEvLC/9rvoEpKdgQeOO9dfA0+sqjcAnwTeUlVbAecAY0nPZO0Af66qxwAH0d1Q5jXAlsBLktwReApwRVVtXVVbAl+b6AlV1cfpblDzpqp64Uq+Dh8BPlRVDwWeDXy8b38ncHJVbduf896TnSDJnklOT3L6/33jiJW8rCRJkuaaeUlnJt2cdgQOr6qbgF/2VYqHAteO229NYP8k29AlEX89g2t8vZ8zmSRf6q+5DNgOOC3dlAW3o0t+HgZ8u6p+3e//+XHXOqKqbkqyAbBhVZ3Ytx8GHDFZ+8jxR/c/zwHOq6or++tcDNyrb/9Akv8Ajqmq78zgeU7nicADs3yKhjskWR94DPC3AFX1v0munuwEVbUEWAKw5xHn1WqMTZIkSVrtZpKYrGzi9Xrgl3SVlUXADTO4xvgP0NVf97Cq+pcVgkmeOcH+o/44g+tO5E/9z5tHlsfW16iqHyXZju4GM+9LcnxVvXsVrzlmEfCIqrp+tLFPVEwyJEmSbkOcLrgzXVeu3wPr98snAbslWZzkznTf3p86bh+ADYArq+pmYHdg8Qzi2TnJxkluBzwT+C5wAvCcJHcB6LffB/gBsFM/7mJN4LkTnbCqrgGuTvLovml34MTJ2lc20CR3B66rqk8DHwAeMoPnOZ3jgb1GrrVNv3gSXVc3kuwCbLQarylJkiQ1M2XFpKquSvLdfrD2V4GlwNl039q/uap+keQqYFmSs4FDgY8CX0zyXOBbzKxycTLwKeCvgM9W1ekASd4OHJ9kEXAj8JqqOiXJPsD3gSuBM5g8CdoDOCjJ7YGLgZdO074yHkw3zubmPqZXzeDY6fwjcECSpXT/RicBr6QbU3N4kjPokqjLVuM1JUmS1MB8nNp3NqTKnkG3dY4xmd49N1y7dQiDt+3d1p9+pwXu717xkdYhDN8667aOYNiu/U3rCAZvr31e2TqEwdv/X/+7dQiDd/2Z+w8qE7j6uptm5bPaRrdfPKjnOR3v/C5JkiSpuTm/+WCSJwP/Ma75kqp61lzHsrolOQrYbFzzW6rquBbxSJIkSfPFnCcm/Yf02+QH9dtCciVJkiS1MOeJiSRJkqTlnC64Y2IiSZIkNeSsXB0Hv0uSJElqzoqJJEmS1JBduTpWTCRJkiQ1Z8VEkiRJasiCScfERJIkSWrJzASwK5ckSZKkAbBiIkmSJDXkdMEdKyaSJEmSmrNiIkmSJDXkdMEdExNJkiSpIfOSjl25JEmSJDVnxUSSJElqyZIJYGIiSZIkNeWsXB27ckmSJElqzoqJJEmS1JCzcnWsmEiSJElqLlXVOgYtQEn2rKolreMYKl+f6fkaTc/XaHq+RtPzNZqer9H0fI20MqyYqJU9WwcwcL4+0/M1mp6v0fR8jabnazQ9X6Pp+RppWiYmkiRJkpozMZEkSZLUnImJWrGf6dR8fabnazQ9X6Pp+RpNz9doer5G0/M10rQc/C5JkiSpOSsmkiRJkpozMZEkSZLUnImJJEmSpObWaB2AFp4k1wIBJhrglKpaf45DGhRfn+kluc9U26vqp3MVy1Al2RgY/V15P/AvwNVVdU2bqIbB35+V52s1tclen4X+uozyvUgz4eB3SfNOkqUsT97WBu4LXAQso3tfe3DD8JpL8mngUcDvR5o3By4GPlpVBzYJbCBGfn8WAQ8ALus33Ru4sKoe0Cq2ofFvbWoTvD6bAT/xd6jje5FmyoqJmkjyLOAxdG/m362qLzYOaTCS3BF4IXAN8Bm612idqvpj08AGpKq2Gl1P8mDgtVXlnYU7W1XVZqMNSc6oqoe0CmhIxn5/knwUeFlV/aBffzjwypaxDY1/a1Ob4PV5IPD6RuEMke9FmhETE825JP8BbAV8vm/aM8nDq+pNDcMakv8BTgPuAmxPV/L+MvDEhjENWlWdk+SRreMYkK+OS/5PBr7eNqRBelRVvXpspapOSXJwy4CGzr+1qVXV+X2Cq47vRZoRExO18FRg66q6uV8/NMk5gIlJZ42q+qcki4Azq+oPSTZsHdSQJHnDyOpi4CHA5Y3CGao9WZ78vwI4u2EsQ/WjJEuAw/v1FwIXNoxncPxbm1qSQ+i6ckHXNfBB+Lc2nu9FWmkmJmrhZuCOwK8Bktylb1PnrCSPq6pvJbm579q1ZuugBmZdug8DGwC/o6syHdkyoIGZLPl/c8OYhmh3um43/9qvnwB8sF04g+Tf2tSOGVleG7iR5YmufC/SDJmYqIX3AKcm+Q5daXcnrJaMehTw8iQ/pevOdQrwz21DGpwvA4cBG/Xr1wDnAWc1imdoTP5XziPovsG9tF9/Bd3f2wmtAhqgL+Pf2qSq6kvjmg5PcjL+Do3xvUgzYmKiOVdVRyY5Cdihb3pzVf2yZUwDs8vI8g1V9atmkQzXwcBeVfVdgCQ7AgfSfdDUxMn/G5tGNEwfBB5fVRcDJNkcOIKuu5I6/q1NYdx0wYuALYE7NwpniHwv0ow4XbDmXJLHTtReVSfOdSxD5bzvU0tyVlVtM13bQtZ/MzmW/J9m8n9LSc6uqq2na1vI/Fub2sh0wdBNoXwp8N6qOr1ZUAPje5FmwoqJWhgdTLk23RvWWcDjmkQzMFPM+74l8FG6bysXuouTvBP4VL++B/CThvEMSpI1gHuy/Hfo/km+TJfcXuLN3/7itCSfYMXfo9MaxjNE/q1N7aF0Y5Se3K+f3z+E70WaOSsmai7JJsB+VfXc1rEMQZKlE8yN77zvI5JsALwDGKu+nQS8y2pSJ8k36L54unak+dF0U3V+tqocnAskWZNuXMlOfdNJwIFVdWOzoAbGv7WpJTmIbsD7fsAX6arbT6iqv28a2ED4XqSZMjFRc0kCnO+dcjtJ/qOq3jJdmzSZJGdW1bbj2kxupdVs9Iuksb+7JD+oqoe1jm0IfC/STNmVS3MuyX4s75O7GNgGsD/uch8dN6ASui5cJNmkqq5sENOgJLkf3QDKTRl5H6squwN2Dp2g7bC5DmKoklzM8vegWxh/p+qFzL+1mekrTH62Wu7QCdp8L9Kk/ONRC6NJyDLg01X1vVbBDND/TNAW4MHAR4C/m9twBukLdGNtDsapJydyTJL/opva9UPAn+mmfVVn+/7na4A/0d38LcDzgA0bxTRU/q1N7dIk21TVWXS/O6fi9PejfC/SjNiVS00kWQe4P930gRdU1Z8ahzRoSdaoqmWt4xiKJCdX1Y6t4xiqJGfTfVO5CXA34OXAN6rqMS3jGpokp1TVw8e12Q1nhH9rK6+vLl1WVde3jmUofC/STFkx0ZxJ8jrgY3QDTT8KXEKXmGye5JVVdWy76IYjyUZ0M5SNThf87iTvAM6qqrPbRDYo3+x/n44AbhhrrKqrmkU0LH+sqg/BX/rA/znJ7VsHNUBJ8iLgc/368+nek7Scf2tT6L9kexUw9kH7pCQHVtUNUxy2kPhepBmxYqI5MzIw8HzgKVV1Wd9+b+A4B793kpxKN93k6Cwmu9F1qTi2qr7aJLAB6ccI3KLZsQGdJO8Ffgx8kq7r5HOBz1XV9lMeuMAkuS9d95JH0HXlOgX4p7EbLsq/tekkORz4A/Bput+hFwHrVtXzmwY2EL4XaaZMTDRnkpxXVQ9KctL4Mu5EbQvVRDOWOIuJZiLJtcC6wE10fbr/D3htVZ3SNDDpNsYbUE7N9yLNlF25NJfOSHIAcGp/E8HP9+3PB85sF9bgvG8l2xasJHsDn6+qS5I8n+4b74OqyhubAVV1h9YxzAf9Han/gVvOOPXSVjENjX9r0zoryZZVdS5AkgcDFzSOaTB8L9JMWTHRnOn74r6CbkacDcZvrqqnz31Uw9O/Ts/mlh+W3tUqpqFJcm5VbZlkM7pZzN4LvM5By50JpptegXdb7iT5PnAiXReTv8w4VVVfahbUwPi3NrUkJwMPA5bSjU/amu736QZwWmXfizRTVkw0Z/rBgB9pHcc88BXgKuAMwNldJjZ2Z+6nAYdV1eFJnKJzuf+h6+8+0TdPY1NPq/tCZO/WQQzc6N/aJ/1bu4V/bB3AwPlepBkxMdGcS7LHRO1VdViSp1fVRPfxWEjuXFVPbh3EwF2Z5EPAM4AnJ1kDWNQ4pqaSPKmqjgcYuxO1pvXNJM+sqi+3DmTAxv7Wng48xb+1FVXVGa1jGBrfi7Qq7MqlOdff+f0WzVX12iRvrqr/nPOgBiTJwcDB/g9vckk2BF4MLK2qbydZG7hnVf2kbWTtjM161zqO+WRkYO6NdANzoXsvWn/yoxYW/9am1s9alpGmop+1LMkxVfW0RqE143uRVoWJiQYhySOq6vut4xiCJOcB96O7z8tYV65UlSVvTcqZ2zRbkjwAeALdh+4TqsrB3b0kG0/UXlW/TXKHqrp2ou23Zb4XaVXYlUtzLskjgeex4g0En5HkaODLVfWVNpENxi6tAxi6/pvu8f2WU1Xre6dqrSwH5k4vyXOBfwOOBPag6zr5+ar6TNvIBuMPwFOAa6rqxNENCzEpkVaViYla+BiwLyveQPAxwDHAhU0iGpCquizJtiy/k/B37Na1oqmmoFzASUmm30XjTDWezYG5nbcCO1bVr5PsAjwL+B5gYtL5Ml1XwI2SfBP4MPCJqnpWy6Aa871It5qJiVq4oaoOHW1I8vaq+mKjeAYlyVuBvwOOonuD/0SSL1TVe9tGNhxJHjtR+/hvLBcY71Y+Qw7MXSmLqurX/XKq6qYkazaNaFjuXlXb9GNvflBV+yS5Z+ugGvO9SLeaiYlaeM5Kti1ULwa2rqo/ASR5H3A23f0D1HnDyPLawA7AWcCCvWdAVT17bHmqme/mLqLhS/JXwKuBa4AP0X3zfRe7cK3gz0k2qqqrgXX6m+T+oHVQA3JhkvtX1QVJxu5DtU7roFpamfeikX19T9IKTEzUwlpJlgCbseLv4IL9UDnOpcBawJ/69bWAy5pFM0BV9YzR9SSbABPN9rZQbTeyvC6wM3AK4IeAFX0ROBTYBNgfeDnwKZZ3oxS8BlgPuBr4LN2kHHbjWu7OwJlJTgHuA5yG9+satQuwI/B1uh4AO9O9Rpf1674naQXOyqU5l+Rs4EBuebdlx1EAST5O98ForP/7M+jeyC8E7wA/kSQBzq+qB7SOZYiSbAB8anxCt9Al+V5VPbJfXlpVWyU5vaq2bx2b5ocko0nsDcBFfXVJQJJjgBdV1e/69Q2Bw6vKSV40ISsmauH3VXVQ6yAG7Pz+MebAVoEMVX8vnLEBlouBbegSXU2gqq5Jcvski6vqptbxDMi3krwU+CRwU9+1SyPGzYC3Vv/4I8vv17Gg7/lSVSe1jmHgNmd59R+65G3TNqFoPrBiojmX5N3Ab4Ej6N6kAKiqq5oFNUBJ1gdurqo/to5laJK8eGR1GXBpVX2vVTxDk2Qj4J0s75J0MvBOv8ld0cgNFm+iu8Hi/wGvrapTmgY2YEmeCjyyqt7eOpYhGJe4rUk35u2PCz1hG5Pk7cBz6WYvK+DZwBFV9e6WcWm4TEw05/o75d6iuao2m/NgBijJpsCn6cbgbEQ3NeeeVeVMJyP6Qab3p/uf3QVjkwUIknyZroL06b5pd+AhC3wKU60mSc6qqm1axzFEJm63lOQhwKPoEriT7batqZiYSAOT5KvA/6uqI5OcQfdt0/72yYUkr6O7D85OwEfpBuIWXXeBV1bVsc2CG5AkZ1fV1tO1LXRJ1gC2YsWbvb4f+BfgEmfngiTPHlldTDexwqPHxubolkzcVpTkAcDj+9UTquqClvFo2BxjojnXf9P9KpZ3MzkJOLCqbpj8qAVlk6o6sl9OVf0kyZ2aRjQce1TVh5PsS/fh6DKAJPcGjgNMTDp/TLJTVX0bIMnj6MYFaEVfo/uw/fuRtvvTTUf9WWDBJybA34wsL6ObNXDXNqEMzySJ23WNwhmcJM8F/g04EngJ8OQkn68qZ3bThExM1MIngD/Q3SE3wIv6tuc3jGlIVvi7TLID/o9uzFr9z9+MJSUAVXVZkl9PcsxCtCdw2EhC+1u6++NoRXesqm1HG5KcUVVPbxXQ0FTVy1rHMHAmblN7K7BjVf06yS7As+i6J5uYaEImJmrhAePK3N9OclajWIboc0m2qqqldIMp3wf8Q+OYhuKM/gZvpyb5NPD5vv35wJntwhqWqjoX2C7JenR37r62dUwDdegEbd5XYUSS+9Ddl+ORdF8kfR/Ya/SLgYUmyVur6t/BxG0lLKqqsS+NUlU3JVmzaUQaNMeYaM4lORT4QP/hiSQPBt5WVc9rGpgGr+8G+Apge2CD8Zv9prvTz8q1D113ycJZuSaU5LETtVfViUm2q6ofznVMQ5PkBLqK9uF90/OBl1TVE9tF1VaSM8dX2jSxJKcBT6qqq5P8H/BNumTlVY1D00CZmGjOJTkZeBiwlO5D09Z0MwjdAFBV3gFet5DkJVV1aOs45gNn5Vo5SY6eqLmqnp7kw1X1urmOaWgmGsi90Ad39939HtI6jvmg74r8i7677TuAi4HPlB8+NQkTE825furASTmVoCbit5Qrz1m5tLok+TrdeIDRJPcFVbVzu6ja8r1oZpLcje7LSIAfVNUvWsajYVvUOgAtPH3icRlwd2AT4KdVdcbYo210GjC/RVl5f0yy09iKs3JNLMlGST6S5MwkZyTZr+8Gp+VeCjwd+DlwBfCMvm0h871oJSXZg25c0rP6xw+SLPTfH03BionmXP+B6VDgu8CTgPOBf6uqr7eLSkNn94mVl+RBwCeB0Vm5dh8b16WOXd50ayT5l6p6X+s45oMkFwIPHxvflmRj4PtVdb+2kWmoTEw055KcCjy/vz/HGXR3hD3BG3ZpKiYmM9fPypWq+v20Oy9AdnmbXpJ3TrW9qt41V7EM0WSvT1W9K8krqurguY5pSJJ8i27w+439+prA8Y4l1WScLlgtrF1VP+mXU1XXJ1lryiOkrsKmlTD+w1ISwA+RE/BGlNMzqZ3aVK+Pv0twFnBCkiP69d2A8/suXlSV03NrBVZMNOf6e5Y8ok9IzqXrcrJdVe3WNjLNF1N9SznXsQxRkn8eWV0b2AW4uKpe0iaiYbLL28pLsgFQ3hNHM5Fkv6k2V9Vr5ywYzQsmJppzSZ4GXFhVFyX5GN1A+A9WlXc310rxg/fMJFkEfLWqntw6liGyy9vk+lkUDwE27JuuAV7mPV46Sb5Jd+PJFdhVSbp1TEwkzXt+8J5cun5cDwa+XFX3bR3PkFh5m15f4X51VX2vX98R2H8h38dk1Ljp79cG/ha4qar2bhTSoCQ5hIkTN2fm0oQcYyJpXus/eG8JbNE6lqFIci3dh4EC1gRuBl7UNKhhGq2QrA38DXBho1iG6o9jSQlAVZ2cxLETvQmmuP9+kh80CWaYjhlZXhd4LnBRo1g0D1gxkTTvTPbBu6qOahrYQCV5EvCEqnpL61iGLMkawNfthrNcf7fu29PdZBG6KZUBDgCoqp+2iGsoktxxZHUxsB3woaq6f6OQBi/JN6vq8a3j0DCZmEia9/zgPb0kS6tqq9ZxDFn/IfMHVfVXrWMZiiRLp9pcVQ+es2AGKMnFLP+SZBnwU2CfqnIWwRFJ7gBQVdcm2RfYu6puahyWBsjERNJtgh+8lxs3dmIR3RiTdarqqY1CGqT+Q/dY//dFwF2Bd1XVf7eLSrrtSLIVcBiwUd90DfDiqjq7XVQaMseYSJp3JvngfXmjcIZodOzEMuBQ4H/bhDJoTxtZXgb80m9xV5Rkb+DzVXVJkucDjwAOqqrzG4c2CEkWAy8HnkRXNTkBOLiqbm4a2HAcDOw1VkHqJ084iO73SLoFKyaS5p1x0wUvAy4F/tcPlZ0kdwOurarr+g9Od6iqq1vHpfknyblVtWWSzYD/Ad4LvK6qHtY4tEFI8kHgXnQftgO8Eri0qt7UNLCBSHLW+BncJmqTxpiYSJqXkqwJ3I/uW8oLq2pZ45AGo58V6GnA74BTgdvRTRfsFKaakSRnVtW2SV5L1x1w3yRnVNVDpj14AUhyHrDV2Jci/SyBZ9uttJPkS8DZwKf6pj2ALavq2e2i0pAtah2AJM1U32/5PLpuAt8DThx3P4GFbp2q+jXwBOD0foagpzeOSfPTlUk+BLwOOKqfuczPDsv9ebRSW923vXbjWu6lwB2AL/SP9YGXNY1Ig+YYE0nz0X/TDaA8JckZwDOALwI7NY1qQJJsDDwf+HLfdGO7aDSPvQB4MfD3VfXjJGsDftu93MeSbDTWVTLJhsDH2oY0HFV1DfCG0Vm5GoekgTMxkTQfbVBVp/TLqaqrkqzbNKJh2Rf4EV0XimP6DwXHtw1J81FV/Q7Yb2T9T8BPmgU0MFX10SRrJ9ma5d1KD2gd11CMzMq1MVBJrgH2qKqzmgamwXKMiaR5J8k5wLZVtSzJ2XQDcl9aVbs0Dk3SApLkccAn6O5fUsB96d6LTmga2EAk+T7wxnGzcu1bVc7KpQlZMZE0H30Y+GvgfOAK4MnASxrGI2lh+iDw+Kq6GCDJ5sARgGPeOrcbvdlkVZ2c5HYtA9KwWTGRNG8l2YBuvKn9liXNuSRnV9XW07UtVM7KpZlyZg1J806ShyQ5CzgLWJrk7CTbtY1K0gJ0WpJPJHl8/zgMOK11UAPirFyaESsmkuadPil5dVV9r1/fEdjfm3ZJmkv9/ZRewfIZAU8CDqwqZ8GTbgUTE0nzTpLvVtWjpmuTJLWT5JtAxrdX1eMahKN5wMHvkuaj45K8H/hMv7478N0k9wGoqp82i0zSgpHkYib+4L1Zg3CG6I0jy2vT3XNq7UaxaB6wYiJp3kmydKrNVfXgOQtG0oLV38h0zNrA04H7VNXbGoU0eElOrKrHto5Dw2TFRNK8U1VbtY5Bkqrqt+OaliT5XpNgBijJaAKyCHgwcKdG4WgeMDGRNO+M+5/dX1TViUm2q6ofznVMkhaeJHuMrI598Paz1XJvGFleG9gS2LVRLJoH/OORNB+9YYK2ACfSjTcxMZE0F0anKV8beBjdFLkCquoZo+tJ7gIcCHgfE03IMSaSJEmrQZLbA1+qqqe0jmWIkqwFnFNV92sdi4bJiomkeSfJw4E3A9cA/wr8DnhAVXljM0kt3R7YvHUQQzFuuuDQvTaHNgtIg2fFRNK8k+RCYG/gHsDjgecA3/E+JpLm0sh0wUU3xuR2wDuq6uCmgQ1EkoeMrC4DflZVV7eKR8NnxUTSfPSrqjoKIMkrqurmJOu0DkrSgrP9yPKyqrq2WSTDdCe6rltXJtkCeFySr1XVda0D0zAtah2AJN0K30yyT39DxUryBOD61kFJWlj66YK3A94K/GuSnRuHNDT7Ar9NsiHwNeDJwBeaRqRBMzGRNB/tDrwY+BZd5fdVOBOOpDmW5LXAu4EL6Waaek6SN0591IJyc1X9CXgq8IWqegVdF1xpQnblkjTvVNV9W8cgScA/AI+oqj8meVVVvSLJqcAHWgc2EH9Osivdl0f/0rctbhiPBs7ERNK8M+6mZn9RVYfNdSySFraq+mO/mCQB1moZz8C8Cng7cFxVnZxkfeA9jWPSgJmYSJqPxt/U7AnA2YCJiaS59Pskd6+qK4B1gaOBoxrH1FSSL1XV3wJU1RnA345tq6rfA0e0ik3D53TBkua9JOsBR1TVLq1jkbRwJNkM+ENV/TrJS4GLqurk1nG1lOSMqnrI9HtKt2TFRNJtwY2A404kzamqumRk+RMtY5FuC0xMJM07SY5m+d2EFwMPBD7fLiJJUi/T7yJNzMRE0nw0OuPNMuCnVfXzVsFIkv7CMQK61RxjImleSHJKVT28dRySpMkl2bmqvj7F9r+vqv83lzFp/rBiImm+cApOSRq40aQkySHcsmvXM5JsD3xmoU8UoFsyMZEkSdJsOGaCtscApwAHAVvObTgaOhMTSfOFAyolaR6pqi+Nb0vyrKo6LMk/tYhJw+YYE0nzgnPjS9L8kuQ+E7VX1U9Hbkwp/YWJiaR5IcmDquq8kfWtgGur6tJ2UUmSJpNk6UTNVfXgJF+oqr+b86A0aCYmkuadJJ8AtgHWAz4IfA7496p6dcu4JEnSrecYE0nz0Q50gybXB75VVQf1s7xIkgYiyWOn2l5VJ85VLJofTEwkzUcXA3epql8mWSPJIuB2rYOSJK3gDVNsC2BiohWYmEiaj/4EnJ3kq8DdgBOAI9uGJEkaVVXPaB2D5hfHmEiad5K8eGT1BuD8qjq3VTySpFtKssdU26vqsLmKRfODiYmkeSnJOsD9gQIuqKo/NQ5JkjQiyX5Tba6q185ZMJoXTEwkzRtJXgd8DNgJ+ChwCV1isjnwyqo6tllwkiRplZiYSJo3kpxZVdsmOR94SlVd1rffGziuqh7QNkJJ0pgkdwQ+AjyJ7kukbwD/VFW/aRqYBmtR6wAkaQbW6n/+ZiwpAeiXf90mJEnSJA4AzgLuCfy8Xz+oZUAaNismkuaNJJ8CrgWup5uN6/P9pucDv66qf2oVmyRpRUnOrqqt++WxivcPquphrWPTMDldsKT55B+AVwDbA3fo18es3yQiSdJkFo+uJLlXq0A0P1gxkSRJ0mqX5KPAwVV1dpLLgD8AL6+q7zUOTQNlYiJp3umnCv5X4Ml0AyqPB95bVdc1DUySNKEk61XVH1rHoWFz8Luk+ejDdF25ng+sDZwH/HfLgCRJK0qycZIPJzkD+E6SjyTZuHVcGi4TE0nz0SOr6rVVdRFwU1V9FtiydVCSpBUcTjdj4rP6x6/7NmlCDn6XNO8l2QDfzyRpaO5YVe8dWf+3JD9sFo0Gz4qJpPno0iTb9MsbAqcC+7QKRpI0oROT7DK2kuSpwGkN49HAOfhd0ryW5H7AZVV1fetYJEnLJbkEuDfwO7qJSjYCLuuXU1WbtYtOQ2RiIkmSpNVuuoHuVfXbuYpF84OJiSRJkqTmHGMiSZIkqTkTE0mSJEnNmZhIkiRJas7ERJIkSVJzJiaSJEmSmvv/RUlMvG5pmakAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import seaborn as sns\n", "import matplotlib.pyplot as plt \n", "\n", "matrix_np = matrix_np.reshape(8,8)\n", "\n", "fig, ax = plt.subplots(figsize=(12,8))\n", "ax = sns.heatmap(matrix_np, cmap=\"Blues\")\n", "ax.xaxis.set_ticklabels(features, rotation=270)\n", "ax.yaxis.set_ticklabels(features, rotation=0)\n", "ax.set_title(\"Correlation Matrix\")\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## An example of cross validation and grid search " ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "## This crossvalidation step takes several minutes, depending on the available cores\n", "\n", "from pyspark.ml.tuning import ParamGridBuilder, CrossValidator\n", "from pyspark.ml.evaluation import RegressionEvaluator\n", "\n", "\n", "paramGrid = ParamGridBuilder()\\\n", " .addGrid(regressor.maxIter, [100,50]) \\\n", " .baseOn({regressor.labelCol: \"median_house_value\"})\\\n", " .build()\n", "\n", "\n", "crossval = CrossValidator(estimator=pipeline,\n", " estimatorParamMaps=paramGrid,\n", " evaluator=RegressionEvaluator(labelCol=\"median_house_value\"),\n", " numFolds=4)\n", "cvModel=crossval.fit(train)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Root Mean Squared Error (RMSE) on test data = 51711.3\n" ] } ], "source": [ "from pyspark.ml.evaluation import RegressionEvaluator\n", "\n", "predictions = cvModel.transform(test)\n", "dt_evaluator = RegressionEvaluator(\n", " labelCol=\"median_house_value\", predictionCol=\"prediction\", metricName=\"rmse\")\n", "rmse = dt_evaluator.evaluate(predictions)\n", "\n", "print(\"Root Mean Squared Error (RMSE) on test data = %g\" % rmse)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "spark.stop()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" }, "sparkconnect": { "bundled_options": [], "list_of_options": [] } }, "nbformat": 4, "nbformat_minor": 2 }