Capstone Project – The Battle of Neighborhoods

Project Report | Capstone Project – The Battle of Neighborhoods

1. Intoduction

This project aims to find the better neighborhood environment, like pub, park or gym. With the map of Scarborough, Toranto, this project will help people to decide which neighborhood is the beneficial place compared to various participants neighborhood.

As lots of people are migrating to Toranto. They need information and resources to balance housing price and schools for their children. So, this project is for those people among choosing neighborhoods,like the access to cafes, schools, super markets, hospitals, etc.

This project will analyse features for people migrating to Scarborough to search a best neighborhood. The features include median housing price and better school, lower crime rates, road connectivity, good management for emergency facilities, and recreational facilities.

People will get an awareness of that area before moving to a new city.

Foursquare API Data:

Foursquare provides different data information in different venues among neighborhoods. Those information includes venue names, locations, menus and even photos. The foursquare platform is to obtain the required information through the API.

With the gaining information of neighborhoods, Foursquare API would gather information about venues of neighborhoods. For each neighborhood, the radius is 100 meters.

Foursquare data contains venues, longitude, latitude and postcodes. The information obtained per venue as follows:

  1. Neighborhood
  2. Neighborhood Latitude
  3. Neighborhood Longitude
  4. Venue
  5. Name of the venue e.g. the name of a store or restaurant
  6. Venue Latitude
  7. Venue Longitude
  8. Venue Category

Map of Scarborough
Capstone Project – The Battle of Neighborhoods

2. Data

Data Link: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

In this project, I’d like to use Scarborough dataset which we scrapped from wikipedia on Week 3. Dataset consisting of latitude and longitude, zip codes.

3. Methodology Section

Clustering Approach:

To compare the similarities of two cities, we decided to explore neighborhoods, segment them, and group them into clusters to find similar neighborhoods in a big city like New York and Toronto. To be able to do that, we need to cluster data which is a form of unsupervised machine learning: k-means clustering algorithm.

K-Means Clustering Algoritm
Capstone Project – The Battle of Neighborhoods

Most Common venues in each Neighborhood
Capstone Project – The Battle of Neighborhoods

Work Flow:

With my credentials of Foursquare API, features of the neighborhoods would be gathered and utilized. As the request limitations, the radiusparameter of neighborhood is set to 700 and the total number of place requests is set to 100.

4. Results

** Map of Foursquare Request**
Capstone Project – The Battle of Neighborhoods

Map of Clusters in Scarborough
Capstone Project – The Battle of Neighborhoods

Average Housing Price by Clusters in Scarborough
Capstone Project – The Battle of Neighborhoods

Schools Rating by Clusters in Scarborough
Capstone Project – The Battle of Neighborhoods

5. Discussion

Problem Solved:

The purpose of this project offer suggestions to people with a better neighborhood in Scarborough.So the connectivity to the airport, bus stops, distance to the downtown, markets and etc are conuts.

Sorted list of house in terms of housing prices in a ascending or descending order
Sorted list of schools in terms of location, fees, rating and reviews

6. Conclusion

With the help of k-means cluster algorithm, the neighborhoods are separated into 10 clusters with 103 different lattitude and logitude from the dataset. The dataset has similar neighborhoods around. The charts represents a particular neighborhood with average house prices and school ratings .

I really appreciate this opportunity and experience with the efforts to deal with all the tasks. This project is a practical application in a real situation by using Data Science tools.The mapping with Folium is a useful tool to consolidate information and make analysis visualized.

Improvment:

With farther works, this project could be more precise in terms in find the best house not only based on price in Scarborough. It may requires something else around.

Depended Libraries :

Pandas: To create and edit dataframes.

Folium: To visualize the neighborhood clusters distribution.

Scikit Learn: To import clustering algorithms.

JSON: To handle JSON files.

XML: To separate data from presentation and XML stores data in plain text format.

Geocoder: To retrieve location from data.

Beautiful Soup and Requests: To extract data from HTML and XML.

Matplotlib: To draw plots.