Why zip codes are so bad for healthcare data and analytics.

More so than any other industry, healthcare organizations (especially hospitals) focus on zip codes. It’s not their fault, regulations require them to report to governmental agencies by zip code, and regulations require that their service areas (primary, secondary, and tertiary) are defined by zip codes.

Those requirements serve a legitimate purpose, but they have created an undue focus and reliance on zip-based data for healthcare analysis, marketing, and strategic decision-making.

There are actually two good reasons why regulations require reporting at the zip code level:

It is easy to determine for every patient, so reporting requirements at that level do not create a significant burden on providers of any size. Patients know and provide their zip code as a matter of course. They can’t provide, nor can every provider easily record, their census tract or zip+4 or lat/long coordinates.
Zips provide adequate anonymity at the aggregate-level. It is reasoned that there are enough people in each zip code to insure that reporting aggregate statistics is unlikely to violate the privacy of an individual in that category.

This second reason, is precisely why zip code data is so bad for internal analysis—it is not descriptive or discriminating enough.

Following are some of the weaknesses of zip codes for most analytic purposes:

They are large in terms of population. The average zip code in the U.S. has a population of approximately 8,000, but they can vary widely. In Illinois for example, the most populous zip code has more than 50,000 people. Analysis at that level masks the insights that are available at a finer level of granularity.

They can be large in terms of geographic area and can be unusually shaped. The important consideration is that it is unrealistic to think a provider can serve an entire zip code uniformly or to determine which provider is best positioned to capture share at the zip level. For example, a hospital may have a 20% share of ED discharges in a zip code, but that share might vary significantly from east (40%) to west (10%) in that zip.

Their boundaries do not follow county or town boundaries, so a single zip code can be in multiple counties. There are even thirteen zip codes that cross state boundaries.

They change relatively frequently. Boundaries can change or zip can be split to create new ones. Among the approximate 42,000 zip codes in the United States, there are approximately 4,700 changes each year.

The limitations above are a function of the zip code’s primary purpose: to efficiently deliver the mail. Their size and shape, crossing county and municipal boundaries, ongoing changes, all are driven by efforts at efficiency. The USPS is not concerned with healthcare analytics.

There are geographic alternatives that are better:

The geographic scheme of the U.S. Census Bureau is designed for reporting, more consistency in size (population), continuity with state, county, and municipal boundaries, and persistence of definition.

Geography	Number in US	Approximate population
State (and DC)	51	n/a
County	3,143	n/a
Census Tract	73,057	4,500
Block Group	257,362	1,275
Block	11,078,297	30

The Zip+4 while still subject to lack of persistence does have the advantage of a small geography and population. The average Zip+4 contains 5-12 mailing addresses. Interestingly, while credit data is regulated at the individual/household level to protect privacy, aggregate statistics at the Zip+4 level is available in an unregulated fashion.

The best for most purposes when available is patient-specific or household-specific data—both from a demographic and locational perspective.

And the rule of thumb is this: always use the most granular data available for the specific data point or purpose.

For example, for site selection or forecasting demand, we prefer to avoid aggregated data altogether. We use address-level data to understand proximity of various populations (whether at home, work or school) as well proximity of competitors. We use individual- and household-specific demographic attributes to understand determinants of demand and ability to pay. You cannot assume that average values for an entire geography are equally distributed across the geography. We use zip code data for data that is available only at that level like discharges by a hospital other than your own.

So there will be occasions and specific reasons to report or examine data at the zip code level, but don’t use that as an excuse. You will improve your insight and decision-making when you use the most granular data available for your purpose.

Feel free to reach out to us if you need help.

Why zip codes are so bad for healthcare data and analytics.

Submit a Comment Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta