top of page

Overview of Data

Data Table

In my analysis, I will be considering the environmental data (i.e. subregion, habitat type, depth, underwater visibility, mean summer water temperature, mean summer dissolved oxygen, and mean summer salinity) to be the predictor variables. The species occurrence data as well as the derived metrics for species richness and diversity will be considered the response variables; they are all continuous. The predictor variables were not manipulated in an experiment, they were simply observed. Subregion and habitat type are the only categorical predictor variables, with the rest being continuous. â€‹

Table 1: Abbreviated data table. ID, SITE_NR, and SUBREGION_NR describe sampling site location. The next 255 columns contain species count data for each site. RICHNESS gives species richness and DIVERSITY gives Simpson's diversity index. The last six columns contain environmental data for each site: habitat type, water depth, underwater visibility, mean summer water temperature, mean summer dissolved oxygen, and mean summer salinity.

Screen Shot 2022-10-27 at 10.41.36.png

Species Frequencies

To explore species frequencies, I summed up the occurrence value at every site for each species and plotted these total occurrence values (Fig 1). As it is impossible to read all 255 species labels, I created a second plot

to view only the top 20 species (Fig 2). To clean the data, 23 fish species

that had a total occurrence values of zero were removed from the dataset.

spec_freq.png
top20-spec-freq.png

Fig 1: Total occurrence value for each species of reef fish.

Fig 2: Total occurrence value for the top 20 species.

We see that this occurrence data is zero-inflated but there is a small number of species with very large total occurrence values. The species with the highest total occurrence value is Coryphopterus personatus, the masked goby, which is a very small fish (max length ~4 cm) that forms large schools. Members of the Menidia genus, Stegastes partitus, and Thalassoma bifasciatum are also schooling fish. This explains their extremely large occurrence values and suggests that these numerical outliers are biologically plausible and should not be removed.  

Species Richness & Diversity

The following plots were made to visually identify possible relationships between species richness and diversity (response variables), and the continuous environmental predictors. Species richness appears to be slightly positively correlated with all three of the below environmental factors (Fig 3). This may be affected by the uneven distribution of sample sites across different depths and levels of salinity and dissolved oxygen. Simpson's Biodiversity Index appears to be slightly negatively correlated with water depth, and fairly uncorrelated with either salinity or dissolved oxygen. As salinity levels increase, the variation in biodiversity across sites decreases. Biodiversity is most variable at moderate levels of dissolved oxygen (Fig 4). Again, these observations may be due to the fact that most surveys were done at sites with low salinity and moderate dissolved oxygen. Mean summer water temperature was not found to be correlated with either response variable.

rich~depth.png
rich~sal.png
rich~do.png

Fig 3: Species richness values based on site water depth (left), mean water salinity (centre), and mean summer dissolved oxygen (right). Linear trend line (black) with 95% confidence interval (grey).

div~depth.png
div~sal.png
div~do.png

Fig 4: Simpson's Diversity Index values based on site water depth (left), mean summer salinity (centre), and mean summer dissolved oxygen (right).

Linear trendline (black) with 95% confidence interval (grey).

Species richness and Simpson's Diversity Index, my derived response variables, are explored here in relation to my categorical predictor variables: subregion and habitat type. These derived metrics allowed me to simplify my analyses to one response variable and one predictor with multiple levels.

​

Differences in species richness between habitat types was observed (Fig 1 - left). Isolated patch reefs with low vertical relief had the lowest minimum value of species diversity, however rubble reefs with low vertical relief had the lowest median and maximum values of species diversity. Comparing levels of vertical relief for contiguous reefss, isolated patch reefs, and spur-groove reefs, we see that as vertical relief increases from low to high species richness also increases. This makes ecological sense, as it is well known that reef fishes tend to prefer reefs with high 3-dimensional structural complexity.

 

Median values of Simpson's Diversity Index do not show large differences as a function of habitat type (Fig 5 - right). Several habitat types however seem to indicate differences in the range of the diversity index. For example, rubble reefs have a very narrow distribution, while contiguous reefs with high vertical relief have a much wider distribution. Apart from rubble reefs, each distribution is negatively-skewed to various degrees, and all outliers are sites with low diversity. These sites with extrememly low diversity could be identified and targeted by management and restoration efforts. 

rich-by-habitat.png

Fig 5: Species richness (left) and Simpson's Diversity Index (right) by subregion. 

The subregion in which the site is found appears to have little effect on the species richness or diversity index (Fig 6). The median values for each subregion falls within the inter-quartile range of the other subregions. Additionally, the variance within subregions looks large relative to the variance between subregions. Species richness in each subregion seem to be approximately normally distributed with little skew, while diversity in each subregion is negatively-skewed with many low outliers. 

rich-by-subregion.png

Fig 6: Species richness (left) and Simpson's Diversity Index (right) by subregion. 

bottom of page