Why?

I like data and cannot lie

Summary

Objective: Find new cafes around Montreal to try out

Problem statement: finding cafes randomly can be tedious and blog sites often repeat known cafes

Solution: Using Google Place API with a modern data stack to fetch and filter out cafes to a updating list of cafes based on Google Reviews and recent review activity to find trending places

Tools Used

Google Sheet, App Script, BigQuery, Google Place API, ChatGPT/Gemini, Cloudflare

MVP

First to validate my idea would work, I went with what I know best, Google Sheets + App Scripts. I had to create a new API Key within my GCP with access to Google Place API, plus a few geo functions as I need to determine Latitude and Longitude for my search.

API challenges

Google Place API has two important restrictions that made it hard to search for all cafes in the city.

My solution at first was to focus my API calls to a individual neighourhoods as API allows for text search. However even by neighbourhoods proved to be too large of an area. So I decided to go by Forward Sortation Area (FSA), the first 3 characters of a postal code for a smaller grid. This appeared to work most of the time to stay below the 60 results per API call. So I was searching 1000m radius per FSA. This gave me ~650 cafes in Montreal. I only had 2 more issues to tackle, time out on my API call and handling FSA dense areas like downtown. So I lastly updated script to have individual search radius per FSA + a few full postal codes, so i can search multiple smaller grids downtown and only a 250m search radius. Although I would have many duplicates, I could easily filter it out downstream While I spent some time to improve the run time and allow for multiple runs to complete search grid (by tracking process and starting consecutive runs from previous point), the end result still need 4-5 runs to complete and I’d get spammed on timeouts notifications. Moving on. For Reviews, as I could only get 5 per place and couldn’t find a way to get more, my current solution is to fetch a daily snapshot and slowly accumulate more reviews per place. So eventually I can see # of new reviews in a short timeframe (ie: a month) and sort by count to find trending places, with the assumption that trending places get more reviews in a short time frame.

Billing challenges

I've twiced accidentally went over the free tier, both times during testing when trying to optimize my search and calling the API too often. First time was minor (a few dollars) that I paid. Second time due to me rewriting my App Script function which was flawed and called the api a few thousand times, I had to put in a Google billing support request to adjust billing :(

Map

To visualize the map search grid, I plotted the existing data I had with the lat/long + search radius to see my coverage

Map Grid Visualization

Pausing API

While I look to optimize/limit my api usage in my pipeline, I've turned off the API to avoid additional costs. This includings Google Maps API for the visualizations.