Mutations (Part 1)

Mutations can be added during the course of a simulation.
These may change the cell fitness and therefore the cell behaviour.

There are quite a few mutation-related parameters that can be defined: - the mutation rate - the proportions of synonymous and non-synonymous mutations - the distribution of fitness effects for non-synonymous mutations - how the effects of multiple mutations combine to produce the overall cell fitness

This last point is can get pretty complicated and is covered in more detail in later guides.

This guide describes the basics of using mutations in simulations.

Setting the mutation rate

If mutation_rates is left undefined or set to zero, no mutations occur

from clone_competition_simulation import Parameters, TimeParameters, PopulationParameters
import matplotlib.pyplot as plt

p = Parameters(
    algorithm='Moran', 
    times=TimeParameters(max_time=10, division_rate=1), 
    population=PopulationParameters(initial_cells=1000)
)
s = p.get_simulator()
s.run_sim()
s.muller_plot(figsize=(5, 5))
plt.show()

png

This plot is just a single colour, because there is a single clone (the initial clone of 1000 cells)


Setting a non-zero mutation_rates randomly introduces mutations. The value is the probability of a mutation occurring per cell division.

To introduce mutations, we must also define what effect the mutations will have. This is explained further in Controlling the mutation effects below.

from clone_competition_simulation import (
    FitnessParameters, 
    Gene, 
    MutationGenerator, 
    NormalDist
)

# We'll add some mutations, so we have to define the fitness of the mutated cells
mutation_generator = MutationGenerator(
    genes=[Gene(name="Gene1", mutation_distribution=NormalDist(mean=1.1, var=0.1), 
                synonymous_proportion=0.5)]
)

p = Parameters(
    algorithm='Moran',
    times=TimeParameters(max_time=10, division_rate=1), 
    population=PopulationParameters(initial_cells=1000),
    fitness=FitnessParameters(
        mutation_rates=0.01, 
        mutation_generator=mutation_generator
    ),
)
s = p.get_simulator()
s.run_sim()
s.muller_plot(figsize=(5, 5))
plt.show()

png

The mutations that occur are marked with a X and are followed by the clones they form. Red Xs mark non-synonymous mutations and blue Xs mark synonymous mutations


The mutation rate can vary. Here we set a high mutation rate from time 1 until time 3 and then a lower mutation rate after that

The mutation_rates are set as a list/array of pairs of [start_time, mutation_rate] The mutation_rate will apply until the next start_time point or until the end of the simulation. Before the first start_time (if not zero) the mutation_rate is zero.

p = Parameters(
    algorithm='Moran',
    times=TimeParameters(max_time=10, division_rate=1), 
    population=PopulationParameters(initial_cells=1000),
    fitness=FitnessParameters(
        mutation_rates=[
            [1, 0.1],  # At time 1, start a mutation rate of 0.1 per cell division
            [4, 0.01]  # At time 4, start a mutation rate of 0.01 per cell division. This continues until the end.
        ], 
        mutation_generator=mutation_generator
    ),
)
s = p.get_simulator()
s.run_sim()
s.muller_plot(figsize=(5, 5))
plt.show()

png

Can see the higher density of mutations between times 1 and 4.

Controlling the mutation effects

The fitness effects of mutations are defined in two classes: Gene and MutationGenerator.

Gene is used to define the proportion of non-synonymous and synonymous mutations and the distribution of fitness effects of the non-synonymous mutations.

The MutationGenerator is used to define how multiple mutations (which can be from multiple genes) combine to form the overall cell fitness.

In this guide, we will only use the default parameters from the MutationGenerator, which is that all fitness effects are multiplied together.
More options are described in other guides.


Let's define a new gene

from clone_competition_simulation import Gene, UniformDist

gene1 = Gene(
    name='Gene1',  # This is the name for the gene. 
    mutation_distribution=UniformDist(0.5, 1.5),  # This defines the fitness effects of non-synonymous mutations
    synonymous_proportion=0.4   # This means 40% of the mutations will be synonymous
)

and use that gene in a simulation

from clone_competition_simulation import MutationGenerator
import numpy as np

mut_gen = MutationGenerator(genes=[gene1])
np.random.seed(0)
p = Parameters(
    algorithm='Moran2D',
    times=TimeParameters(max_time=10, division_rate=1),
    population=PopulationParameters(initial_cells=100, cell_in_own_neighbourhood=False),
    fitness=FitnessParameters(
        mutation_rates=0.01,
        mutation_generator=mut_gen
    )
)
s = p.get_simulator()
s.run_sim()
print(s.view_clone_info())
clone id label fitness generation born parent clone id last gene mutated
0 0 0 1.000000 0 -1 None
1 1 0 1.000000 15 0 Gene1
2 2 0 1.000000 21 0 Gene1
3 3 0 1.339029 42 0 Gene1
4 4 0 0.933265 45 0 Gene1
5 5 0 0.942416 48 0 Gene1
6 6 0 0.876017 60 0 Gene1
7 7 0 1.339029 62 3 Gene1
8 8 0 0.942416 76 5 Gene1
9 9 0 0.643681 97 0 Gene1

The fitness values of the new clones are between 0.5 and 1.5, and the gene mutated is Gene1. Some mutations will be synonymous and will not change the fitness


There are a few pre-written options for the distribution of fitness effects.
Custom distribution classes can be used as long as they have: - a __call__ function that doesn't take any argument and returns a fitness value - a get_mean function that returns the mean of the distribution

We can run a simulation with multiple genes showing some of the options already available.

from clone_competition_simulation import NormalDist, ExponentialDist, FixedValue

# One with a normal distribution
gene_norm = Gene(name='GeneNorm', mutation_distribution=NormalDist(mean=0.6, var=0.1), synonymous_proportion=0.5)

# One with an exponential distribution
gene_exp = Gene(name='GeneExp', mutation_distribution=ExponentialDist(mean=1.05, offset=1), synonymous_proportion=0.4)

# One where every non-synonymous mutation has the same fixed value of fitness
gene_fix = Gene(name='GeneFix', mutation_distribution=FixedValue(value=1.01), synonymous_proportion=0.3)

mut_gen = MutationGenerator(genes=[gene1, gene_norm, gene_exp, gene_fix])
np.random.seed(0)
p = Parameters(
    algorithm='Moran2D',
    times=TimeParameters(max_time=10, division_rate=1),
    population=PopulationParameters(initial_cells=100, cell_in_own_neighbourhood=False),
    fitness=FitnessParameters(
        mutation_rates=0.03,
        mutation_generator=mut_gen
    )
)
s = p.get_simulator()
s.run_sim()
print(s.view_clone_info())
clone id label fitness generation born parent clone id last gene mutated
0 0 0 1.000000 0 -1 None
1 1 0 1.010000 3 0 GeneFix
2 2 0 1.000000 6 0 GeneNorm
3 3 0 1.010000 7 0 GeneFix
4 4 0 1.034740 7 0 GeneExp
5 5 0 1.010000 15 0 GeneFix
6 6 0 0.516414 16 0 GeneNorm
7 7 0 1.000000 21 0 GeneFix
8 8 0 1.110256 27 0 GeneExp
9 9 0 1.000000 28 0 GeneNorm
10 10 0 0.505791 29 0 GeneNorm
11 11 0 0.510849 42 10 GeneFix
12 12 0 1.042249 42 0 GeneExp
13 13 0 1.000000 44 0 GeneNorm
14 14 0 1.004789 44 0 GeneExp
15 15 0 0.994093 47 0 Gene1
16 16 0 1.000000 55 0 GeneFix
17 17 0 1.000000 58 0 GeneExp
18 18 0 1.010000 59 0 GeneFix
19 19 0 1.010000 60 0 GeneFix
20 20 0 0.659173 61 0 GeneNorm
21 21 0 1.038381 61 0 GeneExp
22 22 0 0.552378 63 16 GeneNorm
23 23 0 1.010000 64 0 GeneFix
24 24 0 1.000000 71 0 GeneNorm
25 25 0 1.000000 72 0 GeneNorm
26 26 0 1.000000 75 0 GeneFix
27 27 0 1.015429 77 0 GeneExp
28 28 0 1.298854 89 0 Gene1
29 29 0 0.606296 93 0 GeneNorm
30 30 0 0.512835 95 0 GeneNorm
31 31 0 1.049568 100 0 GeneExp

To give different genes different mutation rates you can use the weight argument in Gene.
The weights can be any non-negative number.

The overall mutation rate is controlled by the Parameters mutations_rates argument.
The Gene weight controls the relative mutation rate of the genes.
By default weight=1, so all genes will have the same mutation rate.

# Two genes, where gene1 is 3 times more likely to be mutated than gene2
gene1 = Gene(name='Gene1', mutation_distribution=UniformDist(0.5, 1.1), synonymous_proportion=0.4, weight=3)
gene2 = Gene(name='Gene2', mutation_distribution=UniformDist(1.1, 1.5), synonymous_proportion=0.4, weight=1)

mut_gen = MutationGenerator(genes=[gene1, gene2])
p = Parameters(
    algorithm='Moran2D',
    times=TimeParameters(max_time=10, division_rate=1),
    population=PopulationParameters(initial_cells=100, cell_in_own_neighbourhood=False),
    fitness=FitnessParameters(
        mutation_rates=0.1,
        mutation_generator=mut_gen
    )
)
s = p.get_simulator()
s.run_sim()
print(s.view_clone_info()['last gene mutated'].value_counts())
last gene mutated
Gene1    72
Gene2    23
Name: count, dtype: int64

dN/dS

The use of synonymous and non-synonymous mutations means we can calculate a dN/dS value.
This can be done for the entire simulation or for each gene.

gene1 = Gene(name='Gene1', mutation_distribution=UniformDist(0.5, 1.1), synonymous_proportion=0.4, weight=3)
gene2 = Gene(name='Gene2', mutation_distribution=UniformDist(1.1, 1.5), synonymous_proportion=0.5, weight=1)

mut_gen = MutationGenerator(genes=[gene1, gene2])
p = Parameters(
    algorithm='Moran2D',
    times=TimeParameters(max_time=10, division_rate=1),
    population=PopulationParameters(initial_cells=62500, cell_in_own_neighbourhood=False),
    fitness=FitnessParameters(
        mutation_rates=0.01,
        mutation_generator=mut_gen
    )
)
s = p.get_simulator()
s.run_sim()

Just running the sim.get_dnds function returns the overall dN/dS ratio at the end of the simulation

print(s.get_dnds())
1.0231533250016447

Adding a time gets the dN/dS at that time point

print(s.get_dnds(t=5))
0.9779496754690142

You can also get the dN/dS for each gene

print(s.get_dnds(gene='Gene1'), s.get_dnds(gene='Gene2')) 
(0.9395973154362416, 1.3317757009345794)

For sequencing experiments, you usually cannot detect very small clones. You can calculate the dN/dS for larger clones only

print(s.get_dnds(gene='Gene2', min_size=5)) 
1.8591549295774648

You can plot the dN/dS over time. The early time points here are not reliable due to low sample size

s.plot_dnds()
plt.show()

png

s.plot_dnds(gene='Gene1')
s.plot_dnds(gene='Gene2', clear_previous=False)
plt.show()

png

s.plot_dnds(gene='Gene2', min_size=5)
plt.show()

png