November 19, 2015

Hypothesis testing in food webs

And now, for something totally different

Get all the ingredients

R packages

What do we need?

library("igraph") # To handle the webs objects and plot them
library("magrittr") # Provides pipes ( %>% )
library("NetIndices") # To compute ecological indices
library("dplyr") # Filter, select, mutate, ... data sets
library("reshape2") # To organise and 'shape' our data sets
library("ggplot2") # Nice plots

options(stringsAsFactors = FALSE) # Ask me why

Ingredients: interactions data

The best stuff is Open Access

Ryan F. Hechinger, Kevin D. Lafferty, John P. McLaughlin, Brian L. Fredensborg, Todd C. Huspeni, Julio Lorda, Parwant K. Sandhu, Jenny C. Shaw, Mark E. Torchin, Kathleen L. Whitney, and Armand M. Kuris (2011). Food webs including parasites, biomass, body sizes, and life stages for three California/Baja California estuaries. Ecology 92:791—791. http://dx.doi.org/10.1890/10-1383.1

Web.source <- "http://esapubs.org/archive/ecol/E092/066/CSMweb_Links.txt"

Ingredients: interactions data

The best stuff is Open Access

Web.source %>%
  url %>% # Open the connection
  # and read the data
  read.csv(sep = "\t") -> Raw.web.data

Ingredients: interactions data

What's in there?

Raw.web.data %>% names
##  [1] "PresentAtCSM"              "PresentAtEPB"             
##  [3] "PresentAtBSQ"              "PresentAtAnyEstuary"      
##  [5] "ConsumerNodeID"            "ResourceNodeID"           
##  [7] "ConsumerSpeciesID"         "ResourceSpeciesID"        
##  [9] "ConsumerSpeciesID.StageID" "ResourceSpeciesID.StageID"
## [11] "LinkTypeID"                "LinkType"                 
## [13] "LinkEvidence"              "LinkEvidenceNotes"        
## [15] "LinkFrequency"             "LinkN"                    
## [17] "DietFraction"              "ConsumptionRate"          
## [19] "VectorFrom"                "PreyFrom"

Ingredients

Knead the data to a web (1/2)

"http://esapubs.org/archive/ecol/E092/066/CSMweb_Links.txt" %>%
  url %>% # Open the connection
  read.csv(sep = "\t") %>% # Read in the raw data
  filter(LinkType %in% "predation") %>% # Filter only the relevant interactions
  filter(LinkEvidence %in% "observed") %>% # and only if directly observed
  select(ResourceNodeID,ConsumerNodeID) %>% # Select the column of From and To
  graph.data.frame(directed=TRUE) -> CSM.graph # And build a igraph web

Ingredients

Knead the data to a web (2/2)

What do we have?

CSM.graph %>% vcount # How many vertices in the graph?
## [1] 80
CSM.graph %>% ecount # How many edges in the graph?
## [1] 242

Connectance:

Link.density <- function(x) x %>% {ecount(.) / vcount(.)^2}
CSM.graph %>% Link.density
## [1] 0.0378125

Null models and observed food webs

Which is which?

Hard!

Which is which?

Better?

Which is which?

What do we look at?

Null models

Null models

The grandfather of random webs (1/3)

Paul Erdős and Alfréd Rényii. (1959)
On Random Graphs. I Publicationes Mathematicae 6: 290–297.

  • \(n\) vertices
  • each link has i.i.d. probability \(p\)

Hence, each and every graph with \(M\) links appears with probability \[p^M \left( 1 - p \right)^{n^2 - M}\]

Null models

ErdÅ‘s–Rényii model (2/3)

CSM.graph %>%
  erdos.renyi.game(n = vcount(.), # Random graph with the same number of nodes...
                   p.or.m = Link.density(.), # ...and link density as the observed
                   type = "gnp", # This is to tell igraph we specify link density
                   directed = T, # We want a directed graph...
                   loops = F # ...with no loops
                  ) -> Random.graph

Null models

ErdÅ‘s–Rényii model (3/3)

Shorter version:

CSM.graph %>%
  {erdos.renyi.game(vcount(.), # number of vertices
                    Link.density(.), # link density
                   "gnp", T, F) # type, directed, and no loops
  } -> Random.graph

Notice the curly brackets { } around the game.

Null models

So many many games

  • aging.ba.game
  • aging.barabasi.game
  • aging.prefatt.game
  • asymmetric.preference.game
  • barabasi.game
  • bipartite.random.game
  • callaway.traits.game
  • cited.type.game
  • degree.sequence.game
  • erdos.renyi.game
  • establishment.game
  • forest.fire.game
  • grg.game
  • growing.random.game
  • hrg.game
  • interconnected.islands.game
  • k.regular.game
  • sbm.game
  • static.fitness.game
  • static.power.law.game
  • watts.strogatz.game

A Little Something Special

Something special in the degree?

For the observed

CSM.graph %>% degree(mode = "total") -> CSM.degree.tot
CSM.graph %>% degree(mode = "in") -> CSM.degree.in
CSM.graph %>% degree(mode = "out") -> CSM.degree.out

And the random

Random.graph %>% degree(mode = "total") -> Random.degree.tot
Random.graph %>% degree(mode = "in") -> Random.degree.in
Random.graph %>% degree(mode = "out") -> Random.degree.out

Something special in the degree?

Exploratory plots

Prepare the data…

CSM.degrees <- data.frame("graph.name" = "CSM",
                       "degree.tot" = CSM.degree.tot,
                        "degree.in" = CSM.degree.in,
                        "degree.out" = CSM.degree.out)

Random.degrees <- data.frame("graph.name" = "Random",
                          "degree.tot" = Random.degree.tot,
                          "degree.in" = Random.degree.in,
                          "degree.out" = Random.degree.out)
CSM.degrees %>% summary
Random.degrees %>% summary

Something special in the degree?

Exploratory plots

… massage the data…

CSM.degrees  %<>% melt
## Using graph.name as id variables
Random.degrees %<>% melt
## Using graph.name as id variables
rbind(CSM.degrees,Random.degrees) -> Full.degrees

Something special in the degree?

Exploratory plots

… define a suitable ggplot function …

Plot.density.degree <- function(data){
  data %>%
    ggplot(aes(x = value, fill=graph.name)) %>%
  + geom_histogram(binwidth = 1, alpha = .5, position = "identity") %>%
  + facet_grid(. ~ variable) %>%
  + theme_bw() %>% return
}

Something special in the degree?

Exploratory plots

Full.degrees %>% Plot.density.degree

Something special in the degree?

Is the mean special?

CSM.degree.tot %>% mean
## [1] 6.05
Random.degree.tot %>% mean
## [1] 5.375

Something special in the degree?

Is the peak special?

Peak.density <- function(array_values) {
  array_values %>% density %$% # compute the density
    x[which.max(y)] %>% return # find the value at which the density is max
}

CSM.degree.tot %>% Peak.density
## [1] 2.776099
Random.degree.tot %>% Peak.density
## [1] 5.015866

Something special in the degree?

Peak.density sanity test (1/2)

Full.degrees %>%
  filter(variable %>% equals("degree.tot")) %>%
  Plot.density.degree %>%
  + geom_vline(xintercept = CSM.degree.tot %>% Peak.density, col = "red") %>%
  + geom_vline(xintercept = Random.degree.tot %>% Peak.density, col = "blue") -> densities.vs.obs

Something special in the degree?

Peak.density sanity test (2/2)

densities.vs.obs

Something special in the degree?

Is the variance special?

CSM.degree.tot %>% var
## [1] 24.88354
Random.degree.tot %>% var
## [1] 4.870253

Hypothesis testing

When the going gets tough, the tough get going.

Test what?

The intrepid hypothesizers

\(H_0\): The degree distribution' (mean, peak, variance) is the same under the random (null) model and in the observed food web.

\(H_1\): No, it is not, you silly! How often do you see such a weird distribution!

Test what?

The repetitive testers

Idea: sample a random web,
compute its degree distribution,
compute the mean, peak and var,
compare with the observed one,
sample a random web,
compute its degree distribution,
compute the mean, peak and var,
compare with the observed one,
sample a random web,
compute its degree distribution,
compute the mean, peak and var,
compare with the observed one,
sample a random web,
compute its degree distribution,
compute the mean, peak and var,
compare with the observed one,
sample a random web,
compute its degree distribution,
compute the mean, peak and var,
compare with the observed one,

Hypothesis testing!

Let's build a randomizer (1/3).

Random sampler (we know this already!):

Sampler <- function(Graph) {
  Graph %>% {erdos.renyi.game(vcount(.), # number of vertices
                              Link.density(.), # link density
                              "gnp", T, F) # type, directed, and no loops
             } %>%
  return
}

Hypothesis testing!

Let's build a randomizer (2/3).

Compute the degrees:

Degreer <- function(Graph) {
  data.frame("degree.tot" = Graph %>% degree(mode = "total"),
             "degree.in" = Graph %>% degree(mode = "in"),
             "degree.out" = Graph %>% degree(mode = "out")) %>%
  return
}

Hypothesis testing!

Let's build a randomizer (3/3).

Compute the desired statistics:

# we define an ancillary function to compute the statistics we want
Stats <- function(Degrees,mode){
  data.frame("Mode" = mode, # We record whether in, out or total
             "Mean" = mode %>% Degrees[,.] %>% mean, # We want to compare the mean
             "Peak" = mode %>% Degrees[,.] %>% Peak.density, # The density peak
             "Var"  = mode %>% Degrees[,.] %>% var) %>% # And the variance
  return
}

For in, out and total degree:

Statistiker <- function(Degree.data) {
  rbind(
    Degree.data %>% Stats("degree.in"),
    Degree.data %>% Stats("degree.out"),
    Degree.data %>% Stats("degree.tot")
  ) %>% # We bind all together
  return
}

Hypothesis testing!

How does it work?

Now we have a pipeline from the graph to the statistics:

CSM.graph %>% # Get the graph
  Degreer %>% # Compute the degrees
  Statistiker %>% # Compute the statistics
  melt %>% # Melt the data (because plots)
  # mutate adds a news variable
  mutate(Origin = "observed") -> CSM.stats

Hypothesis testing!

Run the simulations (1/2).

Prepare the data…

R <- 42 # How many randomizations?
# Each sample of Statistics() is made by 3 rows
L <- R*3
# We initialise an empty data frame to store the simulations data
data.frame("Mode" = L %>% character,
           "Mean" = L %>% numeric,
           "Peak" = L %>% numeric,
           "Var"  = L %>% numeric) -> Rands.stats

Hypothesis testing!

Run the simulations (2/2).

… fill the data.

# Each sample of Statistics() is made by 3 rows
run.range <- seq(from = 1, to = L, by = 3)
for(run in run.range) {
  CSM.graph %>%
    Sampler %>%
    Degreer %>%
    Statistiker -> Rands.stats[run:(run+2),]
}

Hypothesis testing!

Plot, plot, plot.

Transform the data to the same format of CSM.data

Rands.stats %<>%
  melt %>% # Melt the data (because plots)
  # mutate adds a news variable
  mutate(Origin = "random")

Hypothesis testing!

Plot, plot, plot.

Put everything together

Full.stats <- rbind(Rands.stats,CSM.stats)

Hypothesis testing!

Plot, plot, plot.

… define a suitable ggplot function …

Plot.comparison <- function(Data.to.plot){
  Data.to.plot %>%
    ggplot(aes(x = value))%>% # the value on x and density on y
  + geom_histogram( # plot an histogram with the 'random' data
    data = Data.to.plot %>% filter(Origin %>% equals("random")), 
                   binwidth = 1) %>% # and set a proper bin width
  + geom_vline( # plot a vertical line with the 'observed' data
    data = Data.to.plot %>% filter(Origin %>% equals("observed")),
               aes(xintercept = value), # The observed statistics value
               col = "red", size = 2) %>% # And make it thick and red
  + facet_wrap(Mode ~ variable, scale="free") %>% # Do that for each Mode and statistics
  return
}

Hypothesis testing!

Plot, plot, plot.

Full.stats %>% Plot.comparison

Wow, we did it!

Thanks everybody!

Do we still have time?

An ecologically sensible plot

  • \(x\) axis Omnivory Index (from NetIndices)
  • \(y\) axis Trophic Level (from NetIndices)
  • vertex size degree (from igraph)
  • vertex color centrality (betweenness) rank (from igraph)

An ecologically sensible plot

Plot.foodweb <- function(Graph, dodge=0.08, heat=10, size.scale=3) {
   if("viridis" %in% rownames(installed.packages()) == FALSE) {install.packages("viridis")}
   library(viridis)
   Layer <- function(Graph){
       Graph %>% vcount -> V # How many vertices in the graph?
       Graph %>% get.adjacency(sparse = F) %>% TrophInd -> Trophs
       # We define a matrix with two columns:
       c(Trophs$OI + V %>% runif(max=dodge), # one with Omnivory Index (and a bit of noise)
         Trophs$TL - 1 # and one with the Trophic Levels
        ) %>% matrix(ncol=2) %>%
       return
     }
  Graph %>% betweenness(di=F,nor=T) %>% rank(ties="min") %>% max %>% viridis -> col_palette
  Graph %>% betweenness(di=F,nor=T) %>% rank(ties="min") %>% col_palette[.] -> V(Graph)$color
  Graph %>% get.adjacency(sparse = F) %>% TrophInd %$% OI -> Omni
  Graph %>% plot.igraph(layout= Layer(.),
                        vertex.label=NA,
                        vertex.size= degree(.) %>% sqrt * size.scale,
                        edge.arrow.size=.5,
                        edge.width=.5)
}

An ecologically sensible plot

par(mar=c(0,0,0,0))
CSM.graph %>% Plot.foodweb

Let's do it better!

Everything into a big function!

Tester <- function(Observed.Graph,Runs) {
  L <- Runs *3
  Observed.Graph %>% Degreer %>% Statistiker %>%
    melt %>% mutate(Origin = "observed") -> Observed.stats
  data.frame("Mode" = L %>% character,
             "Mean" = L %>% numeric,
             "Peak" = L %>% numeric,
             "Var"  = L %>% numeric) -> Rands.stats
  run.range <- seq(from = 1, to = L, by = 3)
  for(run in run.range) {
    Observed.Graph %>% Sampler %>%
    Degreer %>% Statistiker -> Rands.stats[run:(run+2),]
  }
  Rands.stats %>% melt %>%
    mutate(Origin = "random") %>% 
    rbind(Observed.stats) %>% Plot.comparison
}

Let's do it better!

Everything into a big function!

Graph.grabber <- function(web.source,linktype,linkevidence){
  web.source %>%
    url %>% # Open the connection
    read.csv(sep = "\t") %>% # Read in the raw data
    filter(LinkType %in% linktype) %>% # Filter on interactions type
    filter(LinkEvidence %in% linkevidence) %>% # and evidence type
    select(ResourceNodeID,ConsumerNodeID) %>% # Select the column of From and To
    graph.data.frame(directed=TRUE) %>%
return
}

Let's do it better!

"http://esapubs.org/Archive/ecol/E092/066/EPBweb_Links.txt" %>%
  Graph.grabber(c("predation",
                  "Social Predation",
                  "predation on free-living non-feeding stage"),
                c("observed",
                  "inferred")) %>%
  Plot.foodweb

Let's do it better!

"http://esapubs.org/Archive/ecol/E092/066/EPBweb_Links.txt" %>%
  Graph.grabber("predation","observed") %>%
  Tester(100)