1. Introduction
Moving to a new city can be puzzling. All areas look the same and deciding where to establish your nest is a tricky process.
The aim of this recommendation system is ranking the districts of Berlin (Germany) based on the importance attributed to the following criteria:
- Green Spaces (number and dimension of parks within the district)
- Kindergarten (number of kindergartens in the area)
- Nightlife (number of clubs and bars in the area)
- Eating Out (number and variety of restaurants)
- Quality Yoga (availability and average rating of yoga schools)
The six districts we are going to consider in the recommendation system are:
- Mitte
- Pankow
- Charlottenburg-Wilmersdorf
- Tempelhof-Schöneberg
- Neukölln
- Friedrichshain-Kreuzberg
We asked five apartment seekers to weigh each criteria based on a scale from 0 to 4
The desired outcome of the recommendation system is a ranking of the six districts that is calibrated to the weights specified by each individual apartment seeker
Eg. “The best districts for Brenda are 1. Neukoelln 2. Schoeneberg 3. Pankow 4. Mitte 5. Kreuzberg 6. Charlottenburg”
In order to do that, we will first score each district for the different criteria then multiply the scores by the personal weights provided by the apartment seekers.
2. Data Acquisition & Cleaning
- Zip Codes of the different districts retrieved from their specific wikipedia pages (eg. https://de.wikipedia.org/wiki/Berlin-Mitte) => cleaning of the data involved translation of columns to English.
- Geo Coordinates of the different Zip Codes retrieved through the website http://api.zippopotam.us/de/ => cleaning of data involved correcting a mistake for one specific set of latitude and longitude
- Venue Information and Details retrieved through the Foursquare API. Cleaning of data included dropping useless columns, merging information and dealing with duplicates
- Green Spaces
– Park information retrieved from this wikipedia page => Cleaning of the data involved dealing with missing values, translating columns and converting measure units of dimension
– District dimensions retrieved from this wikipedia page => Cleaning of data involved translating columns and converting measure units of dimension
3. Methodology
3.1 Green Spaces
I based the score on the idea that the more green the better. First I used this formula
Then normalized the score based on the maximum value within the six districts
3.2 Kindergarten
I based the score on the idea that the postal areas of Berlin are similarly populated and that the number of Kindergartens per postal area would be a good metric to follow. First I used the formula
Then normalized the score based on the maximum value within the six districts
3.3 Nightlife
I calculated the score for each district by calculating, first, the number of bars/clubs per Zip Area then found the score based on the following relationship
The idea is that the more bars/clubs you have in your postal area the better, but not in a linear proportional way.
In other words: having 10 bars compared to having 5 adds a lot more value than having 30 bars compared to having 20.
I didn’t have time to conduct an ethnographic research to define where the plateau of this curve would be, so I followed my gut. It could be true.
3.4 Eating Out
I based the score on the variety of cuisines available within the entire district. First I used the formula
Eating Out Score = Number of Unique Restaurant Types in the District
Then normalized the score based on the maximum value within the six districts.
3.5 Quality Yoga
I based the score on the average rating given to Yoga studios within the whole district. First I used the formula
Then normalized the score based on a scale from 0 to 10.
At the end of the scoring, I had a dataframe that scored each district for each of the selecting criteria
All I had to do was to iterate on each apartment seeker and multiply the columns of the dataframe above for the corresponding weights.
4. Results
5. Observations and Conclusions
Despite the heterogeneous criteria, districts like Kreuzberg and Mitte seem to be across the board more desirable than – for example – Neukoelln. A result that in many ways reflects the actual reputation of the different districts among Berliners and tourists alike.
Had I included the average cost of rent and life in the list of the criteria, Neukoelln would have probably been at least partly redeemed.