Diving into Machine Learning

Hello all!

So lately I’ve been messing with machine learning because I’ve always been interested in it and it’s just very cool and interesting to me.  I’d like to talk a bit about what I’ve been doing and struggling with and show some examples. I will be working with scikit learn for Python, and it comes with 3 datasets. Iris and Digits are for classification and Boston House Prices are for regression. Simply put classification is identifying something like a handwritten number as the correct number it is and regression is essentially finding a line of best fit for a dataset.  I still have a lot to learn about sklearn and machine learning in general, but I find it really interesting nonetheless and thought you guys would too.

So my code begins with the import of a bunch of libraries.  The only ones I use in my example here are sklearn and matplotlib, the others are simply either dependencies or libraries I plan to use in the future.

import sklearn
from sklearn import datasets
import numpy as np
import pandas as pd
import quandl
import matplotlib.pyplot as plt
from sklearn import svm

In this import, sklearn is the main library I’m using to fit my data and predict things, sklearn.datasets comes with the 3 base datasets Iris Digits and Boston Housing Prices.  I don’t know much about sklearn.svm, but I do know that it is the support vector machine which essentially separates our inputted data and runs our actual machine learning, so when we input testing data it can determine what number we have written. Numpy is a science / math library that adds support for larger multidimensional arrays and matrices. Pandas is a library for data analysis. Quandl is a financial library that lets me pull a lot of data that I can use for linear regression in the future. And matplotlib and it’s sub-library pyplot allow me to output the handwriting data.
So far my code for the recognition looks like this:

clf = svm.SVC(gamma=0.001, C=100)
clf.fit(digits.data[:-1], digits.target[:-1])
plt.figure(1, figsize=(3, 3))
plt.imshow(digits.images[-1], cmap=plt.cm.gray_r, interpolation='nearest')

Although my understanding is rudimentary, I can explain a little bit of what this does. Clf is our estimator which is the actual machine that is learning, and that is what we pass out training data through with clf.fit().  Clf.fit() lets us pass data into the svm that we made clf off of, and it trains our machine to know what the numbers should look like.  I am passing all digits except for the last one through this function, because we will be testing with the last one.  We then pass a digit through clf using clf.predict(),  which passes data for a know handwritten digit, 8, through clf.  Our object clf then outputs the text <code>array([8])</code> which means that it has predicted our inputted number as 8.  If we print out digits.target[-1:] we can see it and determine if it was correct. We do this using out 3 lines from matplotlib that create the figure, print it, and then show it. The figure we get is this: 

It’s a very low resolution, but it’s an 8! I think that this is brilliant, and I definitely need to learn more about what is happening here with my code. Machine learning is very cool and I definitely need to mess with it more and learn more.  So far I’m learning some of the basic elements like how to fit and predict things, how training and testing sets work, and a lot of the vocabulary that is used when talking about machine learning.  I can now actually talk about things like supervised and unsupervised learning, or classification and regression methods.  Along with this, I’m also learning more about other libraries like matplotlib, and how to write more pythonic (readable) code.  For anyone who wants to try this themselves, there’s a lot of really cool stuff online, but I’m using some of the resources from hangtwenty‘s GitHub repo dive-into-machine-learning.  It can be found here: https://github.com/hangtwenty/dive-into-machine-learning Hopefully by my next post I will have created a basic understanding of linear regression and I can create some cool examples using it, and in my next post I will attempt to give my explanation on how fitting, predicting, and training actually works.

Thanks for reading and have a wonderful day!
~ Corbin