Advances made possible thanks to location data continue transforming business practices and processes when it comes to site planning. And now with insights from new data streams it is even possible to determine what sites are most likely to increase sales for seasonal, temporary, and mobile businesses.
Food trucks, a lunch-time staple for many, operate on a location-dependent business model. Generally speaking, food trucks offer similar lunch options for roughly the same price, which makes it difficult for businesses to differentiate themselves from nearby competitors. As a result, food truck location can determine whether a business succeeds or fails.
Recently, CARTO helped a local food truck business determine the prime spots for their trucks with revenue prediction models. The company provided one month’s worth of anonymized transaction data for each of its 10 food carts, and with this information a team of data scientists from CARTO were able to determine current performance, build increasingly confident revenue models, and, finally, predict the six best performing food truck locations.
Measuring current performance
Before we could predict what locations should be selected to drive future sales, we first needed to figure out a way to measure the current performance of each site in Manhattan and Brooklyn.
To get started, Wenfei and Dongjie, two of CARTO’s data scientists, first aggregated the data by truck by hour to find a measure for the average spend per hour.
The graphs show that hourly revenue for each food truck usually peaks around lunch-time, although sometimes there are spikes in sales around breakfast-time as well. Next, Wenfei and Mamata, CARTO head of cartography, mapped food truck sales using proportional circles reflecting revenue amounts for each location across Manhattan and Brooklyn.
Now we want to figure where the best locations are for increasing sales, which means we’ll need to identify some variables near and around our current locations that can serve as predictors in our revenue model. Traditionally, these predictors are identified using data from the census and points of interest (POI) data.
The demographic insights available from census data are helpful for segmenting target customers, but this use case illustrates one of the significant limitations of working with census data.
The census provides residential data for our area of operations, and in the image above this information is presented at the census tract level. However, many food truck customers are workers who commute into the city or tourist visiting New York landmarks, which is likely why the Grand Central Station and Times Square are among the most profitable locations. As such, residential data offers few insights relevant to increasing sales among this target customer base.
POI data will be more useful here for finding patterns of nearby attractions around high-performing food trucks that can serve as a predictor for our models.
The first map shows every POI point in Manhattan and Brooklyn, but there’s so much noise that it’s hard to determine which attractions appear and reappear near and around each of our food trucks. Since many customers select food trucks based on proximity, 200 meter radius buffers were created around each cart, which is about a 2 ½ to 3 minute walking time, so predictor features could more easily be identified in the second map.
Building more precise models with new data streams
Now CARTO is ready to start building a gradient boosted regression (GBR) model that will allow to determine which features from this data are most important when considering where to place food trucks. In short, the GBR model will help rank feature importance that will provide a list of predictors to look for when considering a potential food truck location.
The first revenue model was created using only traditional data sources, specifically census and POI data:
The GBR model returned an R-squared score, a measure of the variability within the dataset from 0-1 that can gauge confidence in the model. An R-squared score of .38 means that there is a range of variability in the data so to determine with a greater sense of confidence what features are most important to consider when selecting a food truck location more data is needed.
To improve the model, MasterCard spend data was added and the same equations performed to see whether the R-squared score would increase.
MasterCard spend scores provide aggregated and anonymized merchant-level transaction insights on where, when, and how people spend money. More specifically, the transaction percentile score provides a frequency measure that is important. Because most food carts offer similar types of food for around the same price, the frequency measure provides insights on customer volume for each cart.
Here the R-squared score has increased by 18 points since model one, which makes a lot of sense and confirms our earlier assumption with POI buffers that food trucks rely on foot traffic from nearby customers. It is significant to note that when additional derivative data layers were added to the model there was an improvement in R-squared score. Without these new data streams the company would not be in a position to identify with much confidence where the best locations are for each food truck.
The image above presents the 12 features that the model identified as having a statistically significant impact on food truck sales, and the top four features were selected to serve as predictors for identifying new locations: 1. Foot Traffic from previous hour, 2. Foot Traffic from current hour, 3. Day of the week, and 4. Mastercard frequency score.
Now it is time to start mapping the selected predictors across New York City using 100×100 meter grid tiles (roughly the size of a city block). Next, using a histogram, the company looked at the sales distribution across the city and calculated the weekly sales average per truck to be approximately $2,786.
Since the goal is to find new locations that are likely to increase sales revenue, the company selected the higher end of the revenue distribution and then clustered them into revenue areas. Because the model’s R-squared score was .63 there’s not quite enough confidence to pinpoint the exact location for each truck. Instead, these revenue areas were clustered to locate regions within a neighborhood with a higher likelihood of being profitable.
The image above shows the changes to the map that each of these operations yielded. In the end, six locations were identified with revenue predictions for each. Below, the six locations are ranked highest to lowest by weekly sales average for each locations.
- Corona Park: $6,128 weekly sales average
- Penn Station: $5,975 weekly sales average
- SoHo: $5,911 weekly sales average
- Grand Central Station: $5,766 weekly sales average
- West Village: $5,234 weekly sales average
- DUMBO: $5,193 weekly sales average
While there are the usual suspects on this list (Penn Station, Grand Central, etc.), it is surprising that Corona Park turns out to be the best location for increasing food truck sales revenue. When nearby tourist attractions and the area’s population density are taken into consideration, however, the results make sense.
A new era of site planning
New data streams are ushering in a new era of site planning making previously impossible solutions possible. Indeed, as this food truck example highlights, the future of site planning depends on accessing and working with various types of data, from traditional sources to new derivative datasets, to identify, understand, and quantify the impact that mobility patterns will have on your sales revenue.
The blog was first published here.