# Health Systems Research & Development

## North Carolina Health Data Explorer

### Technical Notes

#### What are Quintiles?

Quintiles divide the data (100 county "observations") into five equal groups. To compute the quintiles, the data is ordered so that the smallest value is first, the next smallest is second, and so on. The ordered data is then divided into five equal groups, called "quintiles". The first quintile contains the smallest 20 observations, the second contains the next 20 observations, and so on. The middle quintile is a good measure of the middle range of the data. The highest quintile contains the top 20% of observations.

#### How does the scatter plot work?

The scatter plot diagram in the Double Map shows the correlation between the variable in Map 1 (plotted along the Y axis) with the variable in Map 2 (plotted along the X axis).  By convention, the variable along the Y axis is the dependent variable (the outcome variable), and the variable along the X axis is the independent variable (the one that theoretically "causes" the outcome).  The user can choose which data to plot on the X and Y axes by clicking on the data buttons and selecting an indicator.  For instance, one might theorize that counties where the rate of obesity is high might show a higher prevalence of diabetes.

To plot this relationship, set the "Select Data" drop-down menu on Map 1 (the top map) to show the rate for the prevalence of diabetes.  By default, this also plots the diabetes prevalence along the Y-axis (vertical) of the scatter plot. Then, set the "Select Data" drop-down menu for Map 2 (the bottom map) to show the rate of obesity. By default, this plots obesity on the X-axis (horizontal).  The scatter plot now shows the association between the obesity rate and the prevalence of diabetes in North Carolina counties.  In general, this is a positive relationship.  As the rate of obesity increases in NC counties, the prevalence of diabetes also increases.

The scatter plot shows the relationship between the two variables using a linear regression model. The model estimates a line that is the "best fit" for the data on the scatter plot. The formula for the line is displayed at the top of the graph, and the plot of the line is displayed on the graph. The number in front of the X indicates the average change in the Y variable that accompanies a one unit change in the X variable. It is also the slope of the line. If it is negative, it indicates an inverse (negative) relationship between X and Y, i.e. as X goes up, Y goes down. The size of the correlation coefficient (r) indicates how well the linear regression model explains the relationship between the two variables. The absolute value of the correlation coefficient can range between 0 and 1, with higher values of r demonstrating a stronger association between the two variables. When a variable is correlated against itself (the same variable is plotted on the X axis and the Y axis), the correlation coefficient is always equal to +1. If r is close to zero, this means there is little relationship between the two variables. If it is close to one, there is a close relationship. The square of r (R2) is the coefficient of determination and describes how well the model explains the relationship, i.e. an r of 0.5 yields an R2 of 0.25, which means that the independent variable (X) explains 25% of the variation in the dependent variable (Y). If r=0.9, then R2 = 0.81, and explains 81% of the variation.

#### How Does the Multivariate Map work?

To get a feel for using the Multivariate Map with multiple variables, start by looking at the relationship between just two variables. Click on the Y axis button and pick a variable, then click on the X axis button and pick another variable. Then click on the "Map (color) Variable" and pick "none", and click on the "Size Variable" and pick "none". The colorless scatterplot that results show the relationship between the Y and X variables only. For instance, pick "diabetes prevalence" for the Y variable and "poverty" for the X variable. This scatterplot shows that there is a positive relationship between poverty and diabetes (higher poverty rates are associated with higher rates of diabetes). The strength and direction of the relationship are shown in the regression equation and the correlation coefficient.

Now add in two additional variables. Keep the same two variables for X and Y, but click on the "Map (Color) Variable" and pick "geographic regions". When you click this, the application changes the color of the dots based on what region each county is in: Eastern North Carolina, the Piedmont, or Western North Carolina. The scatter plot still shows the relationship between the X and Y variables, but now you can see if counties in one region or another are clustered on the high end, the low end, or not clustered at all. (**Note: Only the "color variable" will produce a result for the regions. This is because the regions are categorical variables.)

Next, click on the "Size Variable" and pick "Population" (make sure you are using the Multivariate Map from Series 2: Social Life and Economy for this example). The application automatically scales the size of the dot to make it proportionate to the size of the population in that county. Counties with large populations get a big dot and counties with small populations get a small dot. Look at the scatter plot again to see if the big counties are clustered on one end of the scatter plot or the other. The scatter plot still shows the relationship between poverty and diabetes, but including the population variable allows you to see if that relationship varies consistently in any way that seems related to population size.

You can experiment with any of the variables on the application. Just keep in mind that the slope of the line (the scatter plot) will always be determined by the relationship between the X and the Y variables. The X variable will always be the independent (or "causal") variable, and the Y variable will always be the dependent (or "outcome") variable. Then, look for interesting patterns in the Size and Color variables. Do all the "big" dots (generally the big values) cluster toward the right or to the left of the plot? Or do they not seem clustered at all? Do all the darkly shaded dots (generally the big values) cluster above the line or below the line?

In the scatter plot you just made, change the "Map (color) Variable" to "none" and the "Size Variable" to "Percent white". Now the scatter plot shows the relationship between poverty and diabetes, but the counties with a high percentage of residents who are white are shown by big dots and the counties with a small percentage of whites are shown by small dots. Do you notice any clustering? How about if you change the "Size Variable" to "Percent black"? What sorts of differences might this indicate in the relationship between poverty and diabetes for different racial groups?

#### How can I include regions in my map?

There are a number of ways to include regions in your map. To filter your data by region, use the "Filter by Region" button at the top of the application. (For the Multivariate Map, the "Filter" button is at the top right corner). When you pick a region from the drop down menu, the application will include data from only that region, and will exclude data from all other North Carolina regions.

On the Double Map and Multivariate Map, when you filter by region the scatter plot will display the relationship between the X and Y variables for only that subset of the data.

To remove the region filter, scroll to the bottom of the drop down menu and pick "remove region."

On the Multivariate Map you can also include regions as a variable by picking a region for the "Color Variable." When you do this the Health Explorer retains all the counties, but sorts the data by region and assigns a different color to each region. The user can then "eye-ball" the data for regional trends and patterns.

The data from the Health Data Explorer can be downloaded by specific disease. To access the data go to the Simple Map and page down to the Single Disease links at the bottom. Then, click on the "link to data" button.

This will open an excel file with mortality rates by county. Data for each of the disease mortality rates are available via their single map applications. The social, economic, and environmental data are available via the Other Category links. Users should be sure to properly cite the data:

North Carolina Health Data Explorer. Center for Health Systems Research and Development, East Carolina University, Greenville, NC, 2011.

Users may also want to verify the social, economic and environment data at its various sources to ensure it is up to date (all sources are listed in the data file and in the data source link on the application).

#### How can I look at the data over time?

Open the Time Series Application and use the "Select Data" button to choose a disease. Pick one year, then select a county on the map. When you click it, the selected county will show in orange on the Time Series Chart. To compare this county to another county, roll over another county on the map. The second county will show in turquoise on the Time Series Chart. To compare the county to a region or the state, roll over that region on the Region/State/US table. To compare a county to its peer counties**, click on the Filter by Region button and scroll down until you locate the county of interest. Click on its peer group and Explorer will display only that data. Roll over the counties on the bar chart and their 10 year trend line will display on the map.

**Peer counties are the three or four other counties in the state that are considered similar to the target county, based on health needs and risk factors. Peer counties are described in more detail in the NC-CATCH Training Manual, page 14, which can be found at this website: http://www.schs.state.nc.us/SCHS/catch