{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Sklearn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## sklearn.linear_model" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from matplotlib.colors import ListedColormap\n", "from sklearn import model_selection, datasets, linear_model, metrics\n", "\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] } ], "source": [ "%pylab inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Linear regression models\n", "\n", "#### Data generation\n", "\n", "We build a dataset with 2 features: one is informative and the other is redundant. Besides, adding **coef=True** we ask return us the approximation function coefficients into the **coef** array, not only data." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "data, target, coef = datasets.make_regression(n_features = 2, n_informative = 1, n_targets = 1, \n", " noise = 5., coef = True, random_state = 2)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Coefficients: [38.07925837 0. ] intercept: -0.13052467965349365\n" ] } ], "source": [ "print('Coefficients: ', coef, 'intercept: ', linear_regressor.intercept_)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We plot/draw a dependence of features and a target label.\n", " - Red line: y=f(feature1)\n", " - Blue line: y=f(feature2)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAD4CAYAAAAEhuazAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAfF0lEQVR4nO3dfYwd13nf8e+zK66UFRNEumRrReTuqoEcRBZatSIEBwYCNTIqWQmiuIgLMStHgY1ss1EEp60RW94/7KJgYThoWiS2k1CxYVt3K1VAGlhw7DhSXmCjsKtSraxIVpgwJpekZUQiJcOS2crm7tM/Zi737t2ZuTP3zvv8PsBgd+e+zLkzs8+c+5wz55i7IyIi3TJTdQFERKR8Cv4iIh2k4C8i0kEK/iIiHaTgLyLSQZdVXYC09u3b50tLS1UXQ0SkUZ566qlz7r5/dH1jgv/S0hLHjh2ruhgiIo1iZhtR65X2ERHpIAV/EZEOUvAXEekgBX8RkQ5S8BcR6SAFf5GcrK/D0hLMzAQ/19erLpE0WsEnVGO6eorU2fo6rKzAhQvB3xsbwd8Ay8vVlUsaqoQTypoypPOhQ4dc/fylrpaWgv/PUYuLcOpU2aWRxsvxhDKzp9z90Oh6pX1EcnD6dLb1IolKOKEU/EVysLCQbb1IohJOKAV/kRwcOQLz8zvXzc8H60UyK+GEUvAXycHyMhw9GqRkzYKfR4+qsVcmVMIJpQZfEZEWK7TB18w+aWYvmtmzQ+s+ZGbfNLOnw+XOocceMLMTZnbczG7PowwiIpJeXmmfTwF3RKz/z+5+U7h8HsDMbgDuBt4UvubjZjabUzlERCSFXIK/u38JeDnl0+8CHnH31939JHACuCWPcoiISDpFN/j+mpk9E6aFrgrXXQucGXrO2XDdLma2YmbHzOzYSy+9VHBRRUS6o8jg/7vAjwI3Ad8C/lO43iKeG9nq7O5H3f2Qux/av3/XLGQiIjKhwoK/u/+9u2+6+xbwINupnbPAwaGnHgBeKKocIiKyW2HB38yuGfrz7cCgJ9BjwN1mdrmZXQdcDzxZVDlERGS3XEb1NLOHgVuBfWZ2FvggcKuZ3USQ0jkF/GsAd3/OzB4Fvg5cBO5z9808yiEiIunoJi8RkRbTqJ4iInKJgr+ISAcp+IuIdJCCv4hIByn4i4h0kIK/iEgHKfiLiHSQgr+ISAcp+IuUYH0dlpZgZib4ub5edYlkhw4eoFyGdxCReOvrsLICFy4Ef29sBH+D5vithY4eIA3vIFKwpaUgnoxaXIRTp8oujezS8gOk4R1EKnL6dLb1UrKOHiAFf5GCLSxkWy8l6+gBUvAXKdiRIzA/v3Pd/HywXmqgowdIwV+kYMvLcPRokEI2C34ePdrqtsRm6egBUoOviEiLqcFXREQuUfAXEekgBX8RkQ5S8BcR6SAFfxGRDlLwFxHpIAV/EZEOUvAXEekgBX8RkQ5S8BcR6SAFfxGRDlLwb5gOzjYnbaeTuhK5BH8z+6SZvWhmzw6tu9rMHjezvw1/XjX02ANmdsLMjpvZ7XmUoQsGs81tbID79mxz+l+RxtJJXZlcRvU0s58EXgM+4+43hus+Arzs7h82s/cDV7n7+8zsBuBh4BbgR4AngDe6+2bSNjSqZ+tnm5Mu0klduEJH9XT3LwEvj6y+C/h0+PungZ8bWv+Iu7/u7ieBEwQXAhmjo7PNSZvppK5MkTn/f+ju3wIIf/6DcP21wJmh550N1+1iZitmdszMjr300ksFFrUZOjrbnLSZTurKVNHgaxHrInNP7n7U3Q+5+6H9+/cXXKz66+hsc9JmOqkrU2Tw/3szuwYg/PliuP4scHDoeQeAFwosR2t0dLY5aTOd1JXJbRpHM1sCPjfU4PubwPmhBt+r3f03zOxNwH9lu8H3z4Dr1eArIpK/uAbfy3J684eBW4F9ZnYW+CDwYeBRM3s3cBp4B4C7P2dmjwJfBy4C940L/CIikq9cgr+7H4556LaY5x8BlNQTEamI7vAVEekgBX8RkQ5S8BcR6SAFfxGRDlLwFxHpIAV/EZEOUvAXEekgBX8phebrqBEdDCGnm7xEkgzm67hwIfh7MF8HaAiX0ulgSCi3sX2KprF9mkvzddSIDkbnFDqZi0gSzddRIzoYElLwl8Jpvo4a0cGQkIK/FE7zddSIDoaEFPylcJqvo0Z0MCSkBl8RkRZTg6+IiFyi4C8i0kEK/lIa3VhaMzognabgL6UY3Fi6sQHu2zeW1iXedC4ORh2Qe+6Bffs68OEF1OArJanzjaWjIx5A0Pux1Z1g4g4IdODDd0tcg6+Cv5RiZiaoYI4yg62t8sszrM4XpsLEHZCBVn/4blFvH6lUnW8s7eSIB+N2fKs/vICCfyE6lz9Ooc43ltb5wpRJlhMv6oAMa9yHl8zcvRHLzTff7E3Q77vPz7sH36mDZX4+WN91/b774qK7WfCzLvukFcdskg/R77v3ejtf08gPL0mAYx4RUysP6mmXpgT/xcXd/0sQrJf6quuFKbVpTrzGf3hJEhf81eCbszo3bEqL6cSTGGrwLUlr8sfSLDrxJCMF/5zVuWFTmiGx3TbuQZ14klHhc/ia2SngVWATuOjuh8zsauC/AUvAKeBfufsrRZelDIP7YtbWgt5yCwvB/5/ul5E0EqfYJcX8uzrxJKXCc/5h8D/k7ueG1n0EeNndP2xm7weucvf3Jb1PU3L+kt76umLVqMQbzoh5sNeDvXu1IyVSXM6/8Jp/jLuAW8PfPw38JZAY/KVdEmu4HY5byTecxTx4/nywgHakpFZGzf8k8ArgwO+7+1Ez+7a7//DQc15x96siXrsCrAAsLCzcvBE3Fok0TieHVEhhopp/lK7vSLmkyt4+b3H3fwa8DbjPzH4y7Qvd/ai7H3L3Q/v37y+uhFK6uBruxkaxd0WPtpf+6q/W4G7sQaHMOHL6Hub57o6HL7Xb3nln0HUzDQ3PIONEdf4vagE+BLwXOA5cE667Bjg+7rVNuclL0om7J6nIm0yjboKt/ObWiEL1OeyLnHRj0xd7rwbliSq8mfuVV+quQklEzE1ehdb8zexKM/vBwe/AvwCeBR4D7g2fdi/w2SLLkZXG5ineuKFlLlwIGoPztLa2c9jmsrabKKJQyzzMKa5ji1lO7b0xSN1HFd4drrhCXTxlMlFXhLwW4B8BXwuX54C1cH0P+DPgb8OfV497r7Jq/q0Y56UhBqMKxNXCzfLdnllyrX/S7aYaHSHuSWkKlfQ8Mw3PIInQ2D7paGye4o3GqqixxYrY5+NSTZNsN1VlIS7f1OvFf/jBMjubXHidmDJGXPDXHb4jyhzbvYvppajZA7/zHZib2/m8IjIX41JNk2w3KhuzK3UUl286fx5eGXNv4+Zm8FN38Ereoq4IdVzaVvPvanopbv/2euVkLka/dayuTrfdpGzM2Cdl/Rqi9I5MAI3qmU5Z87l2tZ972wafTHUck+bLTaK5dCUHGtUzpeXl4P9tcTEISIuLxfz/dXLqQOIHmXRvZuorVTYmTb5pVFEnnshA1NeBOi5t6+ff1fa7cX3tm5j6St3bZ1zjbldOAikVavCtl6623w1/s4pSej/7HCwvBymera3gZ2RlfXkZzp0LBmFL0oWTQGpBwb8iZaWX6mgQLONGKqhr6itV76xxT3r55fgNdOkkkOpFfR2o49K2tI9Mn/oqs/NLf/XLPm/fzd6ff/RJXc33SWVQ2kfqZprUV9T9AisrkzcYj5s9a+13D3LBdxY2VX/+0Sd1Nd8n9RN1Rajjopp/sarqQj7pdvOsQI+tsPd6bmxG9+dna/sDxDXgjo4Xof76UiJU85c4edeiy5A0JHSmcq+vs3bv2eQK+/nzLMRMpLJgZ7Z3XJzR/q2pWoi7eQe4lCjqilDHRTX/4lSVhp7mLuekcXpSdxcNCxBbqx9U2AmGWZ7ntZ3b4TXvczi+IOC+Z89ENft+331ubudbzc3pS4JkhwZ2kzjjhijIkqXI8txpLjrj7hdIdeEKC7DIyej34GQwsFq4Ysc4+5wcH/iniNhxtwT0epnfSjpOwV9iJQXhLLXzrDX5VOPiJOj342NuqvcYCurjavUTBf4pvkIlvZ1IFgr+EispaGepnWetyeeRbpr4Pfr9HVefpOAedXGY4/95jxfTXQwmmJggKfgr9SNZKPh3UB7pmiy186w1+TxGNp34PdIO7k98Wijpm8K0Nf+kkSCacEuAOjTVh4J/x+Q1ZHSRNf9BOacNEhO9R4ZhluMahHd9Tk5FXBUmG6xo6pRWhbo6XHldKfh3TF49eIrM+Zci7sqQc80f3I3NXKu8STOc1blmrZuY60XBv2OmbUwdVlRvn8IlXY3GdRcKlz6HvceLDlvja/6zZ0op/upqDS+yQ/I892R6Cv41UGZgbHLta5r9tOO1M6ej8/CDnXDbbYnRfJXfiUj5bPneK77ne/i/u9bDlvd6+R7XqH1R92Nb9/J1jYJ/CZKCVtkpkVqmYFKYptyRr41riE1Kqoc1/rhc/+Kie/+2T8R+I5jwvq7U+yc27VSTmnVTz722UvAv2LgTvoraUK1SMClNs59iX8vJnStmZhK70/Q57LN8Pz7IsuU+P5/YFlDEcc3lxraSNPHca6u44K85fHMybi7Xts1dW5Rp9lPsa9lii9lU21/nMCs8yAWujH3O4uxZTm0eZIZN4obHKuK4Jk0FrOl+JY7m8C3YuDl54+aujVvfVdPsp9jXxgzKFmWN/5gY+M3gyOb7xr5vEcc1aZIbBX7JSsE/J+OCloZxT2ea/XTkCMzPXdz5Wr7LET6QevuniY/ahvMrvwLLi/8j2B4fYI7Xdz1vZqaY4xp3ji0uKvBLdgr+ORkXtOo+bWNdhg+eZj8ts85R/2UWOYWxxSKnOMovs8zDu5+8dy/Mze1avbD3lcj3np3Z4qG+8fGPc+lgL/Mw7+ZBYGeu6bLLUnzQCagCIbmKagio41L3Bl/35jZytaZ3RoYbt3zPnqDD/MgBS70vwoMdOyLoYjEfsannmFQHNfhKnHGN1bW1vh7MuHL6dJATiWsNjRPzAQdvu7EBs7OwuRk89ciR3d9A1JAvdVe7Bl8zu8PMjpvZCTN7f1XlkPGN1bWzvg779sE99+ycfsws3cs5zBInmdn4RmSKa3l5O8WyuRmsi5vdTA350lSVBH8zmwU+BrwNuAE4bGY3VFGWronK7TcqgA3mnDx/fvdjUVXw+Xno9bZfHnbl3GAJZyY2qKeZix3yzcPXpd1FOiIqF1T0AvwE8MWhvx8AHkh6TRNy/nXX1LFidkiT1+/1dibFhz542hx9lvFp8hqZtDHHQBqFOt3hC/w88AdDf78T+GjE81aAY8CxhYWF4vZOR4ybsasRDYlphmKOam0NP+DY+XpDZd+RrfFwpChxwb+qnH9UcnbXd3Z3P+ruh9z90P79+0soVrsl5faXl4O2z62t4GdeXVBzSWUMv8lMilM26oOGH3BhMfr1oymusrtVNq7dRRqvquB/Fjg49PcB4IWKytIZZef2B+n54TbZqPx6pjcZtMAmSfhAaYN62fdlNKrdRdoh6utA0QtwGfAN4DpgDvga8Kak1yjnP72y88q5pDKy9N1P+YHqmOJSzl+KQp1y/kF5uBP4G+DvgLVxz1fwz0eZgS+XST2ScvyDC8Ps7PbfDY6WdbwoSfPFBf/K+vm7++fd/Y3u/qPu3ugb1JvURa+o3H6Uq6+OXh+ZyhjsRLNgfITBT99uCrrUP59NlmbPsH7kVPD4xYvBz6I/UMHKPDYiGttnSrnktVtmcA9WVFf8ubmR/ProDVuwndcfyu/v6p+/eaCR+7lJFQVpNw3vMKXGDo1QkMHFcPQGqYHe5a9y7uJVQWCfmQlq+GMacdc5zL18hk12j5jW68G5c3mUvHhR+0bj8EvRaje8Q1uoi95OUXfGDnv59Su3g/3WVqrAv8KDkYEfgm8X6+vT16jLqJGnvWu4qfStpmGiGgLquNS1wVc35+w07h6sXVMqjlmSpkocvqF3mp4y0/a0SdtQm0sDeE2pt1J9UbfePlmXugb/qk/6InuITPLeST0zYydTH1r6HPZFTrqxGQb+6Dty0yxpL8DTXMCzHP82VxTa/NmaTsG/QFV10UsKPNOWKdNFbWhj/d79Pj83Ovn5lvfsXKrAP89rO2vFUwT/tDXqaWrkSRe70f1edUWhSG3+VtN0Cv4tFBd4pk2DJL334uyZsRGtv+eXfLH3anDh6b3q/T2/lCpax6V4ki4AZsHnrarmPy7NNbrf29qXP4+af1v3TdUU/FsozRhnk/4jxtbk2NwZ0ZKuQEmPRyyT1vJXV+PXxxkONL2e+9zc7otKVO19VJqP14XURx7tJm39VlQ1Bf8WyjryQZav4LE1uUGD7ezs+KtPXJU8ZkkabjmuPIObe7ME3ahAs2fPdnFHP5ZZ/IUk6r0m2e9tqPVO8xnUZlAcBf8WiqstTZsGiX3vFA22kZEv5XOjcv7DbRjjgmzaoJsUaOIeM4sPZoOgl/UiNO44NvECMCm1GRRHwb9lhgPO6NA2eQWTfj/I8Q963mQO/BMsO3r7LMbny5Nq/OOCblKgGTeU0Lj9Ncl+V61X+6BICv4tkibIjOa0Rye3mmpjaYN3mgvGhBEza0PrsElq/mlroZOkPlTr1befIin4t0iWWlIu/1Qpq9yRaRte8z6/EP2aQaPwBBEzSxfLLPuk348PxprFq1htaPeoIwX/FslSU8w9sCREx9gG273ndnenmZub6r+7yLtyV1d3f8Qia6Gq9UqRFPxbJEtALySlEFPlTpwft4BqXd3ubm7S9qQ74oK/RvVsoCyjQxYy6mjMmy5xkg2W8t2WiExFo3q2SJb5ZSPnrLULHNlYHj/0YtwwjVFvChzhA8zz3Z3bmmLS8zJHidSIlNI5UV8H6rgo7TO5SykFtnzRNnb2wIlLLo9LRI/mKVZX3RcXvc8vhN1Dt6ZKX5SZB1fOXdoM5fwlSxeZfu/+6C6bJXVBKbMHTBt726gNQQYU/CV15/jV1d2DqV26u3e4pTgiwuQVdMrs+962fvb6JiPDFPwl1WBA/d79bmxF14Q5uV0djhnNc3Q450mDjmr+k2vb55HpxAV/Nfh2SYqW17Xz/xbHIh87zcL2e0TMSbj2/Q9y4Xs7p1ucdJrCyIbqKRqP67KtMmhqUUlDwb/thruxrK3BlVcmPv00C7GPLXB6u0tRRCSJe+0kQSdLj6ZplbmtMizEHMKo9erl1F0K/m02uCFgYyP45r+xAd//PuzZE/uShZlvRq43tjjS+62hJ+6OJAtER/m4YDTO8nJwf8DWVvCzqcG4bGm/yUSdHisrugB0RlQuqI6Lcv4RxrWuJk20MnhsZmZn3j5mKsXVmd9LNYNXXjn/tB8xD21sIE2z39Q20A2owbdl0vTDT2rcHUSFiMH/+xz2xZnT2908e/dHRo/+6pe3h3yePeP91S/nGqzLCspFB8G6drtsWy8niabg3zZJEWuSmU8yRoAyAnNZNdMig2Cdv1Wo5t8NccFfOf8mWl+PHrAHgtbViJ44mY1J1EdtImvPnnGNjWX1WsnSQJpVHvupKG3r5SQZRV0R8liADwHfBJ4OlzuHHnsAOAEcB25P836q+YfG1eoH+YVpav1TTKaStracpkZcVs20yNp53VMrdU1JSX4oO+0TBv/3Rqy/AfgacDlwHfB3wOy491PwDyXdqDWIWFlnOx80AOcwmUrawJzm9WWP71NEEFRqRaoWF/yrSPvcBTzi7q+7+0mCbwC3VFCOZkrKeQw6p8d9n19Z2b3eDM6fD35/6KHUfSqnTRmkSemU3dc/rlvpNH3hlVqR2oq6IuSxENT8TwHPAJ8ErgrXfxS4Z+h5nwB+PuY9VoBjwLGFhYVCr46NkbYqGVeVHf5mMOV0VdPUlptSI87j24dSK1IlipjMxcyeAN4Q8dAa8FXgHODAfwCucfd3mdnHgK+4ez98j08An3f3P0zaliZzCWWZySVJIbO8pJfXxyhaxbtJZGpxk7lcFvXktNz9rSk3/iDwufDPs8DBoYcPAC9MU45OGUTGtbUgR7IQjreTNWJWPABMXh+jaBonR9qqsJy/mV0z9OfbgWfD3x8D7jazy83sOuB64MmiytFKgwT1Qw8Ff7/zndmT0UX2b0xpNM8O9Rtnpga7SaQQRTb4fsTM/srMngH+OfBvANz9OeBR4OvAnwD3uftmgeVop2kHZqlZS2Rdx5mp2W4SyU9UQ0AdF3X1HJFHi2mNWiLr3ABco90kkhlFNPiWSQ2+ofX1IFEed4evWZBHaZiZmSDcj2rixxkcojq3ZUh3xDX4aniHHBQ+JvpgA2ZBfj8u8ENjk9Ftya3XNX0lMkrBf0qF/7MPbwCiq8cDDU5GtyW3XuexfESGKfhPqfB/9rSDtDV8+qm2zKalrqHSFMr5T6nwXHXcBobpjqPa0E1hUjfK+Rek8Fz1uDdqYm6kxdqSvpL2U/Cf0tT/7ONai6M2YBb8bGpupMXSpK80abrUQlT/zzoude7nn7of+OgTV1fTjRqmjuatUfXMXjqVugf1869Y1EhmZtH5fCWIW6vKNoGmDKYn+YrL+Sv4lyXuvz5KE+9sklSqvJlNjdHdpAbfqmXp69e0O5sktSpvZlM3VBmm4F+WtP/dNewaogbK/FTZG6gtd1FLPhT8yxL1Xz+q16tdAlbDFeSrypvZ1A1VhinnX6Zxg7LVMPmqPHG7aNC57lHOvwzj8iOD2UsG/fRH1TD52oQ8sdJS6SVNVC/douCflyz5kQYlX+teVKWlRCaj4J+XLCO8NSj5WveiahRNkcko+E9qNNcQl8ePyo80aAjLuhe1CWmpKiklJnHU4DsJ3a1bG2qQjqc7egXU4JuvqFyD++6G3DrlR1qq7mmpKiklJkkU/CcRl1Nwr29+pKXqnpaqklJikuSyqgvQSAsLyjXUyPKygn2UuNO0Lj21pFqq+UP2VjHlGmpLDZzbdJpKoqhxnuu4FDae/6QDrGtg9Nqpeqz8OtJpKmg8/xjqLtIaOpQiu6m3Txy1irWGDqVIegr+dR+/QFLToRRJT8FfrWKtoUMpkt5Uwd/M3mFmz5nZlpkdGnnsATM7YWbHzez2ofU3m9lfhY/9tlncEJclUUfx1tChFElvqgZfM/txYAv4feC97n4sXH8D8DBwC/AjwBPAG91908yeBN4DfBX4PPDb7v6Fcduq1fAOIiINUUiDr7s/7+7HIx66C3jE3V9395PACeAWM7sG+CF3/0rYBekzwM9NUwYREcmuqJz/tcCZob/PhuuuDX8fXS8iIiUaO7yDmT0BvCHioTV3/2zcyyLWecL6uG2vACsAC+qyISKSm7E1f3d/q7vfGLHEBX4IavQHh/4+ALwQrj8QsT5u20fd/ZC7H9q/f/+4ou6me/1bQ4dSJF9FpX0eA+42s8vN7DrgeuBJd/8W8KqZvTns5fOLQNJFZHKa3681dChF8jdtb5+3A78D7Ae+DTzt7reHj60B7wIuAr8+6NETdgn9FPADwBeA+z1FITL39tG9/q2hQykyubjePu0d22dmJnpmLTPY2sqvYFI4HUqRyXVvbB/d698aOpQi+Wtv8Ne9/q2hQymSv/YGf93r3xo6lCL5a2/OX0REOpjzFxGRWAr+IiIdpOAvItJBCv4iIh2k4C8i0kGN6e1jZi8BETf519o+4FzVhchIZS5e08oLKnNZiijzorvvGhmzMcG/iczsWFQXqzpTmYvXtPKCylyWMsustI+ISAcp+IuIdJCCf7GOVl2ACajMxWtaeUFlLktpZVbOX0Skg1TzFxHpIAV/EZEOUvAvkJn9ppn9tZk9Y2Z/ZGY/XHWZxjGzd5jZc2a2FU65WVtmdoeZHTezE2b2/qrLM46ZfdLMXjSzZ6suS1pmdtDM/sLMng/Pi/dUXaZxzOwKM3vSzL4WlvnfV12mtMxs1sz+j5l9ruhtKfgX63HgRnf/x8DfAA9UXJ40ngX+JfClqguSxMxmgY8BbwNuAA6b2Q3VlmqsTwF3VF2IjC4C/87dfxx4M3BfA/bz68BPufs/AW4C7jCzN1dbpNTeAzxfxoYU/Avk7n/q7hfDP78KHKiyPGm4+/PufrzqcqRwC3DC3b/h7t8DHgHuqrhMidz9S8DLVZcjC3f/lrv/7/D3VwkC07XVliqZB14L/9wTLrXv2WJmB4CfBv6gjO0p+JfnXcAXqi5Ei1wLnBn6+yw1D0pNZ2ZLwD8F/mfFRRkrTJ88DbwIPO7utS8z8F+A3wC2ytjYZWVspM3M7AngDREPrbn7Z8PnrBF8fV4vs2xx0pS5ASxiXe1rd01lZnuBPwR+3d2/U3V5xnH3TeCmsJ3tj8zsRnevbVuLmf0M8KK7P2Vmt5axTQX/Kbn7W5MeN7N7gZ8BbvOa3FQxrswNcRY4OPT3AeCFisrSama2hyDwr7v7f6+6PFm4+7fN7C8J2lpqG/yBtwA/a2Z3AlcAP2RmfXe/p6gNKu1TIDO7A3gf8LPufqHq8rTM/wKuN7PrzGwOuBt4rOIytY6ZGfAJ4Hl3/62qy5OGme0f9Kwzsx8A3gr8daWFGsPdH3D3A+6+RHAu/3mRgR8U/Iv2UeAHgcfN7Gkz+72qCzSOmb3dzM4CPwH8sZl9seoyRQkb0n8N+CJBI+Sj7v5ctaVKZmYPA18BfszMzprZu6suUwpvAd4J/FR4Dj8d1k7r7BrgL8zsGYJKwuPuXnjXyabR8A4iIh2kmr+ISAcp+IuIdJCCv4hIByn4i4h0kIK/iEgHKfiLiHSQgr+ISAf9f4zs0paLAJMLAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "pylab.scatter(data[:,0], target, color = 'r')\n", "pylab.scatter(data[:,1], target, color = 'b')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Based on the above plot one can find out which of 2 features is informative." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Let's build a model and view its coefficients.\n", "We split data for train and test sets." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "train_data, test_data, train_labels, test_labels = model_selection.train_test_split(data, target, \n", " test_size = 0.3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### LinearRegression over the train data" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "linear_regressor = linear_model.LinearRegression() # classificator\n", "linear_regressor.fit(train_data, train_labels)\n", "predictions = linear_regressor.predict(test_data)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ -76.75213382 34.35183007 -11.18242389 -61.47026695 44.66274342\n", " -13.26392817 18.17188553 25.7124082 -19.16792315 101.14760598\n", " -105.77758163 23.87701013 12.42286854 -18.57607726 21.20540389\n", " 38.36241814 -45.38589148 29.8208999 -84.32102748 0.34799656\n", " 11.96165156 39.70663436 41.95683853 -10.06708677 -63.4056294\n", " -16.30914909 -45.27502383 -57.46293828 28.15553021 -17.27897399]\n" ] } ], "source": [ "# original labels\n", "print(test_labels)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ -68.31690488 38.87063362 -12.62644748 -55.97354269 50.53828408\n", " -15.92863777 18.48923956 28.00480498 -10.77148765 95.936183\n", " -100.85245811 31.5152017 6.88671402 -24.57940803 16.71365889\n", " 41.26798296 -43.31634025 31.48688292 -80.45697523 -1.51872908\n", " 13.95215963 37.61637807 43.5105848 -9.47060759 -59.39279793\n", " -11.74032551 -47.34651564 -53.9339432 22.64507627 -13.03856029]\n" ] } ], "source": [ "# predicted labels on the test objects\n", "print(predictions)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets count MAE of those original dataset labels to the predicted labels." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3.859435388011848" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "metrics.mean_absolute_error(test_labels, predictions)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use **cross_val_score()** to evaluate our linear regressor, the regressor eveluation will be more precise. \n", "\n", "Scoring is MAE (neg_mean_absolute_error). \n", "\n", "We cross-validate the data using 10 folds." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mean: -4.070071498779695, std: 1.0737104492890204\n" ] } ], "source": [ "linear_scoring = model_selection.cross_val_score(linear_regressor, data, target, scoring = 'neg_mean_absolute_error', \n", " cv = 10)\n", "print('mean: {}, std: {}'.format(linear_scoring.mean(), linear_scoring.std()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We create our own scorer with parameter **greater_is_better**." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "scorer = metrics.make_scorer(metrics.mean_absolute_error, greater_is_better = True)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mean: 4.070071498779695, std: 1.0737104492890204\n" ] } ], "source": [ "linear_scoring = model_selection.cross_val_score(linear_regressor, data, target, scoring=scorer, \n", " cv = 10)\n", "print('mean: {}, std: {}'.format(linear_scoring.mean(), linear_scoring.std()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's take a look at the coefficients of the inbuilt dataset **make_regression()** function that we've got." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([38.07925837, 0. ])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "coef" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The coefficients of the model built with **LinearRegression()**:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([37.86162519, 0.33738658]), -0.13052467965349365)" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "linear_regressor.coef_, linear_regressor.intercept_" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-0.13052467965349365" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# In the regression model there is also the intercept.\n", "linear_regressor.intercept_" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original function equation\n", "y = -0.13 + 38.08*x1 + 0.00*x2\n" ] } ], "source": [ "print(\"Original function equation\\n\\\n", "y = {:.2f} + {:.2f}*x1 + {:.2f}*x2\".format(linear_regressor.intercept_, coef[0], coef[1]))" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Trained function equation\n", "y = 37.86*x1 + 0.34*x2 \n" ] } ], "source": [ "print(\"Trained function equation\\n\\\n", "y = {:.2f}*x1 + {:.2f}*x2 \".\\\n", " format(linear_regressor.coef_[0], \n", " linear_regressor.coef_[1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Lasso regularizator or L1 for the Linear Regression" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "lasso_regressor = linear_model.Lasso(random_state = 3) # build model\n", "lasso_regressor.fit(train_data, train_labels) # train it with the train set\n", "lasso_predictions = lasso_regressor.predict(test_data) # get predictions using test set" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We eveluate the model quality by cross-validation. We'll use the same scorer." ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mean: 4.154478246666398, std: 1.0170354384993354\n" ] } ], "source": [ "lasso_scoring = model_selection.cross_val_score(lasso_regressor, data, target, scoring = scorer, cv = 10)\n", "print('mean: {}, std: {}'.format(lasso_scoring.mean(), lasso_scoring.std()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that std has decreased comparing to L2 regularizator: std: 1.0737104492890204" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[37.0580843 0. ]\n" ] } ], "source": [ "print(lasso_regressor.coef_)" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original function equation\n", "y = -0.13 + 38.08*x1 + 0.000*x2\n" ] } ], "source": [ "print(\"Original function equation\\n\\\n", "y = {:.2f} + {:.2f}*x1 + {:.3f}*x2\".format(linear_regressor.intercept_, coef[0], coef[1]))" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Trained Lasso function equation\n", "y = 37.06*x1 + 0.0000*x2\n" ] } ], "source": [ "print(\"Trained Lasso function equation\\n\\\n", "y = {:.2f}*x1 + {:.4f}*x2\".format(lasso_regressor.coef_[0], lasso_regressor.coef_[1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The advantage of L1 (Lasso) regularization is that we got the weight of 0.0000 value before non-informative/redundant feature." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 1 }