The next to last – and what many believe is the most important phase of a citywide analytics project – is developing the analytics solution. I say “most” people, because in my humble opinion, I believe the last phase of a city-wide analytics project (not the next to last) is the most important. But let’s save that discussion for the next blog.
First, let’s talk about what encompasses a city-wide analytics solution. Most times, a citywide analytics project is driven by a multi-agency, multi-organization (to include non-profit, private sector, etc…) citywide policy or initiative. Or, as in the case of the example cited below, a citywide analytics project can be identified by its role as a sub project within a larger citywide initiative.
Citywide initiatives tend to add a level of complexity to a project with regards to the entities involved. These complexities include the data needed for the project, the visibility of the project, the many invested users and stakeholders, and finally, the intended impact of the solution.
The process to identify an analytics model that will be used towards identifying a solution can only begin once the problem has been identified. This process includes finding data needed for the project as well as selecting an appropriate analytics model. For the sake of this article when I mention the term analytics model, I am referring to everything from a simple regression model to deep learning with neural networks. Data investigation and the identification of an analytics model work hand in hand. In most cases, identifying data sets and modifying the model will iterate multiple times during a project.
Here are the three things that you should always keep in mind when identifying the data and analytics model for an urban analytics project:
- Always be looking to answer the analytics question, nothing more;
- Make certain subject matter experts, policy stakeholders and domain leaders are in the room;
- The majority of the time, it’s all about location, location, location.
Here is how this looks using a real situation from my time as the Director of the New York City Mayor’s Office of Data Analytics (MODA). Playing a key part in the NYC Mayor’s effort to combat homelessness and low-income housing issues within the city, the NYC Commission on Human Rights (CHR) began moving towards a more proactive strategy for identifying landlords in any of the five boroughs who may be discriminating against potential renters based on their source of income. In the past, the CHR served mostly as a mechanism to receive reports of this alleged conduct. Their process was reliant on people reporting a possible violation, which would only get reported after the alleged action took place.
CHR was looking to be more proactive, so with this strategy in mind they reached out to MODA to partner with them and implement a data-driven strategy for income discrimination enforcement. The city problem was “how do we proactively drive down housing-based income discrimination?” The translated analytics problem was “how do we use citywide data to predict where a landlord may likely refuse a renter an opportunity to rent an apartment based on their use of a housing voucher?” This was broken down to even more granular analytics questions after more investigation, such as:
- How do we geospatially define a NYC neighborhood?
- What data can we use to test the crime/schools/housing stock hypothesis?
- How can we group buildings into ownership portfolios?
- What do we know about each building/tax lot in NYC?
Essentially, the plan was to use location intelligence to identify neighborhoods that would be targeted for inspection. In this instance, we looked at Neighborhood Tabulation Areas (NTA)as defined by the NYC Department of City Planning data. We then looked to characterize a neighborhood for likelihood of issues with discrimination. In short, we were trying to characterize neighborhoods where renters would be happy to live, that had low income housing, but surprisingly low or no instances of voucher usage.
For this, once we identified all of the NTAs across the city, five variables describing each NTA were used: population, mean Student Achievement Score, NYPD Felony Crime per Capita, total rental housing stock and total count of Housing Choice Vouchers (HCVs) from the federal government. Once that was complete, we began to build ownership portfolios. Building a portfolio in this case was just a way of grouping buildings (or rather lots, which is the ownership structure here in NYC) based on common variables.
The CHR had “testers” that posed as potential candidates for housing and would take the location of target neighborhoods and test whether a landlord would commit income discrimination. The result was CHR filing 120 income discrimination complaints against landlords – the most income discrimination in its history. If you want to read a more detailed version inclusive of the outcome of this project, the Harvard Data Smart initiative wrote a great article on this work.
You may have noticed a lot of data inspection, analysis and location intelligence work was done to execute this project. What you didn’t hear me talk about was artificial intelligence, machine learning, big data or any other fancy buzz words. Sometimes those technologies come into play. What I am hoping this article gets across is that the analytics model should always be based on the need, not the fancy technology.
This is how you develop the analytics solution. Do not decide on the analytics model that you want to use first, then force it into solution. In this case, we identified a city problem, teased out the analytics question, identified the data that we would need to answer the analytics question, then used that to solve the problem for the client. The old adage applies here, “the client is always right.” Build your analytics solution in order to solve their problem.
Amen Ra Mashariki is part of the GovLoop Featured Contributor program, where we feature articles by government voices from all across the country (and world!). To see more Featured Contributor posts, click here.