Lab 6

Spatial statistics

Overview

In this assignment you will:

  • Compute quadrat analysis by hand
  • Evaluate how well a dataset conforms to a statistical test’s assumptions (specifically nearest neighbor analysis)
  • Determine if a statistical result is valid
  • Interpret p-values and test statistics

Part I: Quadrat analysis

  1. Observe the point pattern below. Make a prediction about whether it is clustered, dispersed, or random. (1 pt.)

  1. Next, compute the variance by hand, and show your work. You may upload a picture of your work or you may put the cell counts and arithmetic in your word processing document (2 pts.).

  2. What is the variance to mean ratio (1 pt.)?

  3. Is the process clustered, random, or dispersed (1 pt.)?

Part II: Nearest neighbor analysis

Copy the following files into your folder on the Q:\ drive, and add import them into an ArcGIS Pro project.

  • Q:\StudentCoursework\Haffnerm\DATA\tweets2.csv
  • Q:\StudentCoursework\Haffnerm\DATA\ne_10m_admin_0_countries.shp (be sure to copy all files associated with the shapefile and not just the .shp!)

Create a separate layer for both of the following:

  • All tweets in Finland
  • All tweets in Spain

You will need to reproject the data in order to complete nearest neighbor analysis. Use the following EPSG codes:

  • Tweets in Finland: EPSG:2393
  • Tweets in Spain: EPSG:2062

Next, you will compute nearest neighbor analysis on each subset (i.e. layer) of tweets to determine if they follow a dispersed, clustered, or random spatial pattern in each location. However, before doing so, look at the raw data and make a prediction about the spatial patterns in the two countries.

  1. Do you think each is dispersed, clustered, or random? (1 pt.)

  2. Briefly discuss the spatial patterns of tweets in the two countries. (1 pt.)

  3. Compute nearest neighbor analysis on both subsets (i.e. layers) of tweets using the minimum/maximum x,y values as the bounding box for the study area (this is the default in both ArcGIS Pro and QGIS). Create a table with the following for both countries (3 pts.):

    1. Nearest neighbor index
    2. z-score
    3. The conclusion (i.e. the word “dispersed”, “clustered”, or “random” based on the test results)
  4. Make a statement about the reasons for the test results in each location, and compare the two countries’ results. I.e., why did the results turn out the way they did in each place? Did this align with your expectations (2 pts.)?

  5. Why is the geolocated Twitter content in Finland more dispersed than that of Spain? Is this surprising based on the population distributions of the two countries? Take a look at the raw data (i.e. individual tweets), particularly of those in Finland to determine this (1 pt.).

  6. Compute nearest neighbor analysis on the tweets in Finland one more time, but instead of using the minimum/maximum x,y values as the study area, compute the area of Finland and use this value instead within ArcGIS. Report the z-score, nearest neighbor index, and conclusion based on this method, and compare it to your previous results. Did this affect your conclusion (2 pts.)?

  7. How do the study areas in these cases influence the analyses? How could Spain’s borders particularly cause problems? Do you believe it affected the test conclusion in this case (2 pts.)?