Diving into Machine Learning

Hello all!

So lately I’ve been messing with machine learning because I’ve always been interested in it and it’s just very cool and interesting to me.  I’d like to talk a bit about what I’ve been doing and struggling with and show some examples. I will be working with scikit learn for Python, and it comes with 3 datasets. Iris and Digits are for classification and Boston House Prices are for regression. Simply put classification is identifying something like a handwritten number as the correct number it is and regression is essentially finding a line of best fit for a dataset.  I still have a lot to learn about sklearn and machine learning in general, but I find it really interesting nonetheless and thought you guys would too.

So my code begins with the import of a bunch of libraries.  The only ones I use in my example here are sklearn and matplotlib, the others are simply either dependencies or libraries I plan to use in the future.

import sklearn
from sklearn import datasets
import numpy as np
import pandas as pd
import quandl
import matplotlib.pyplot as plt
from sklearn import svm

In this import, sklearn is the main library I’m using to fit my data and predict things, sklearn.datasets comes with the 3 base datasets Iris Digits and Boston Housing Prices.  I don’t know much about sklearn.svm, but I do know that it is the support vector machine which essentially separates our inputted data and runs our actual machine learning, so when we input testing data it can determine what number we have written. Numpy is a science / math library that adds support for larger multidimensional arrays and matrices. Pandas is a library for data analysis. Quandl is a financial library that lets me pull a lot of data that I can use for linear regression in the future. And matplotlib and it’s sub-library pyplot allow me to output the handwriting data.
So far my code for the recognition looks like this:

clf = svm.SVC(gamma=0.001, C=100)
clf.fit(digits.data[:-1], digits.target[:-1])
clf.predict(digits.data[-1:])
plt.figure(1, figsize=(3, 3))
plt.imshow(digits.images[-1], cmap=plt.cm.gray_r, interpolation='nearest')
plt.show()

Although my understanding is rudimentary, I can explain a little bit of what this does. Clf is our estimator which is the actual machine that is learning, and that is what we pass out training data through with clf.fit().  Clf.fit() lets us pass data into the svm that we made clf off of, and it trains our machine to know what the numbers should look like.  I am passing all digits except for the last one through this function, because we will be testing with the last one.  We then pass a digit through clf using clf.predict(),  which passes data for a know handwritten digit, 8, through clf.  Our object clf then outputs the text <code>array([8])</code> which means that it has predicted our inputted number as 8.  If we print out digits.target[-1:] we can see it and determine if it was correct. We do this using out 3 lines from matplotlib that create the figure, print it, and then show it. The figure we get is this: 

It’s a very low resolution, but it’s an 8! I think that this is brilliant, and I definitely need to learn more about what is happening here with my code. Machine learning is very cool and I definitely need to mess with it more and learn more.  So far I’m learning some of the basic elements like how to fit and predict things, how training and testing sets work, and a lot of the vocabulary that is used when talking about machine learning.  I can now actually talk about things like supervised and unsupervised learning, or classification and regression methods.  Along with this, I’m also learning more about other libraries like matplotlib, and how to write more pythonic (readable) code.  For anyone who wants to try this themselves, there’s a lot of really cool stuff online, but I’m using some of the resources from hangtwenty‘s GitHub repo dive-into-machine-learning.  It can be found here: https://github.com/hangtwenty/dive-into-machine-learning Hopefully by my next post I will have created a basic understanding of linear regression and I can create some cool examples using it, and in my next post I will attempt to give my explanation on how fitting, predicting, and training actually works.

Thanks for reading and have a wonderful day!
~ Corbin

Mindsets and Prime Factorization

Hello all!

It’s been a while since I last posted, apologies. I’ll be doing another Project Euler problem today.

Problem 3:

The prime factors of 13195 are 5, 7, 13 and 29.

What is the largest prime factor of the number 600851475143 ?

I’d like to begin by saying that I spent way too long on this problem improving my code after I had already solved it. But this is a very good mindset to have when you are trying to become better at something. Due to the increased complexity of this problem, it is a good idea to write an algorithm for it, and I will likely do this for all future problems.

def largest_prime(n)
  Iterate through each number up until the square root of n
    If the iterable is prime and the modulo of the base and iterable is 0:
      Append the iterable to some empty list
  print the [-1] element of the list, and this is the largest prime

I decided on an algorithm where I will check every number’s divisibility and ‘prime-ness’ up until the square root of the original number. This works because you can have no prime factor larger than the square root of the number. Next, I add every valid number to some list, and then I go back and pick out the largest one from the list and this is the answer’s problem. Now I need to turn this into working code, and then from there, I can improve it.

def largest_prime(base):
  primes = []
  root = math.sqrt(base)
  for i in range(int(root+1)):
    if is_prime(i) == True and base%i == 0:
      primes.append(i)
  prime = primes[-1]
  print(prime)
  print(primes)

In theory, this code should work but now I need to make a function is_prime(n) that allows me to check if a number is prime for some boolean value. I’ll do this by checking every number up until the square root for divisibility, and if I find no divisibility other than 1 and the number itself then I will declare the number as prime.

def is_prime(n):
    if n <= 1:
        return False
    for i in range(2, (int(math.sqrt(n))+1)):
        if n%i == 0:
            return False
    return True

Although all of this code works, it is very inefficient. If I insert a timer using from time import * my code takes an average of roughly 7 seconds to run. I would like to cut this down to 1 second or less in any way I can. One way I can optimize the code is by removing my use of lists, but this only saves me ~1 second. This is where I’ll introduce recursion into my programming repertoire. Recursion is when you have a function that calls on itself repeatedly until the task is done. When the task is done the look breaks and returns your answer.

In order to do this problem using recursion, I have to redefine my algorithm. Instead of using another function is_prime(n) I’ll be taking each number, finding the lowest common divisor. I will then take this new number, (n/lcd), and perform the same method on it until I can no longer be divided and this final number is our largest prime factor.

def largest_prime(base):
  start = clock()
  for i in range(2, int(math.sqrt(base)+1)):
    if base % i == 0:
      return largest_prime(base/i)
  print(base)
  end = clock()
  print(end-start)

This is my final piece of code and it takes ~0.0008 seconds to run. I am very happy with the results as the code is simple and condensed. I’d like to end this post by talking about mindsets and why they matter. When you’re trying to get better at something you should realize that your abilities are not as fixed as you think they are. I see this problem a lot in my high school, kids in my Pre-Calculus class often complain that they are bad at math but then when they like a topic we are learning in the class they do extremely well at it because they enjoy it and put effort into it. A good analogy of this is someone learning to play an instrument. When you’re practicing hitting higher notes on a trumpet you don’t squeak a note and say “I’m bad at this and I should just give up trumpet” you instead say “Okay, I need to put faster air through the trumpet” and you try again, so why not do this with other things you are trying to get good at? If you would like to know some more about mindsets I highly recommend Mindset: The New Psychology of Success by Carol Dweck. It details lots of information and research about growth mindsets and how they can improve your life.

Thanks for reading and have a wonderful day!
~ Corbin

Functions in Programming, List Comprehension, and Even Fibonacci Numbers

Hello all!

Today I’ll be doing another Project Euler problem.

Problem 2:

Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be:

1, 2, 3, 5, 8, 13, 21, 34, 55, 89, …

By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.

There are many ways to do this problem, but I’ve chosen to create a series of functions that I can run to get Fibonacci numbers until some upper bound, and then the sum of even numbers in a list. I find that abstracting calculations like these into functions is a very easy way to make programs feel and look more organized as well as increasing reusability and convenience. A good example would be that I may need a Fibonacci sequence in a future problem, and using a function like this allows me to just paste this code into a future program and I can recall on it using fib(max) when needed. I’m going to begin by creating a function to find the Fibonacci numbers up to some upper bound. I’ll create the function fib(max) where max is the upper bound of the number we’ll be getting from the Fibonacci sequence. The best way I can come up with to create a Fibonacci sequence is to create a list and use fib[-1] and fib[-2] to grab the latest two numbers in the sequence. Next, I’ll add in a while loop that iterates through the Fibonacci sequence and places the numbers into the list fib[ ], to create the next Fibonacci number. This loop then breaks when we hit the upper bound we placed earlier. After this the function will return the newly filled list fib[ ].

def fib(max):
  fib = [1, 2]
  while(int(fib[-1] + fib[-2]) <= max):
   fib.append(int(fib[-1]+fib[-2]))
  return fib

Next, I need to create a function that will add the even numbers of a list together. I’ll call this function even_sum(list) and to create it I’m going to use something new I learned called List Comprehension. You can learn more about List Comprehension here but I’ll provide a quick overview. List Comprehension allows a smaller and more efficient way to create a list by placing a for loop inside of square brackets. We can also place operators such as if statements inside the brackets. Using this new tool my next function will look like this:

def even_sum(list):
return sum([i for i in list if i%2==0])

Now that we have both of our functions we can just combine the two and run them through each other like so:

even_sum(fib(4000000))

This now returns us with the sum of all even Fibonacci numbers up until 4 million!

Thanks for reading and have a wonderful day!
~ Corbin

I would like to give special thanks to Christian Ferko for teaching me about List Comprehension.

Intro and Multiples of 3 and 5

Hello all!

I’ll insert a small introduction here. Welcome to my blog, my name is Corbin Frisvold and I’m a student interested in furthering my passions for Computer Science, Mathematics, and several other fields. Here my posts will primarily consist of my exploration into becoming a better programmer, but I will likely post other things around on the blog. Anyways, onto my first post!

Today I’m going to be doing a problem from Project Euler.
Problem 1:

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

Find the sum of all the multiples of 3 or 5 below 1000.

I find it useful to just dive into the code for a small problem like this. We know we want a for loop iterating through 1000 and checking each number’s divisibility by 3 or 5. What I come out with is this:


sum = 0
for i in range(1000):
  if i % 3 == 0:
    sum += i
  elif i % 5 == 0:
    sum += i
print(sum)

What this code does is it creates a variable sum that will be used to store the sum of all our valid multiples. Then we use a for loop increment through all numbers between 1 and 1000. After running the program will print the output. Running this we get sum = 233168. And now we have solved Problem 1! My intention with post frequency is to post as much as I can, but I will attempt to keep a minimum frequency of 1 or 2 posts a week.

Thanks for reading and have a wonderful day!
~ Corbin