Mutations (Part 3)

This guide describes some of the more complex options for defining the fitness effects of mutations. Most of the complexity comes from defining how the fitness of a cell with multiple mutations should be calculated.

multi_gene_array=False

This is the default and simplest option.
Each new mutation in a cell combines with the previous mutations without any regard for the gene it came from.


Run a simulation where all new mutations have a fitness of 1.1 These multiply the previous fitness of the cell

import numpy as np
from clone_competition_simulation import (
    Parameters, 
    TimeParameters, 
    PopulationParameters, 
    FitnessParameters,
    Gene, 
    MutationGenerator, 
    FixedValue 
)

mut_gen = MutationGenerator(
    genes=[Gene(name='Gene1', mutation_distribution=FixedValue(1.1), synonymous_proportion=0)], 
    combine_mutations='multiply'  # This is also the default option
)

np.random.seed(1)
p = Parameters(
    algorithm='Moran', 
    times=TimeParameters(max_time=10, division_rate=1),
    population=PopulationParameters(initial_size_array=np.ones(4)),
    fitness=FitnessParameters(
        fitness_array=np.linspace(1, 1.3, 4),  # Giving the initial clones some different fitness values
        mutation_rates=0.1, 
        mutation_generator=mut_gen,     
    )
)
s = p.get_simulator()
s.run_sim()

print(s.view_clone_info())
clone id label fitness generation born parent clone id last gene mutated
0 0 0 1.000 0 -1 None
1 1 0 1.100 0 -1 None
2 2 0 1.200 0 -1 None
3 3 0 1.300 0 -1 None
4 4 0 1.320 22 2 Gene1
5 5 0 1.452 32 4 Gene1
6 6 0 1.452 39 4 Gene1

The fitness of clone_id 4 is 1.32 = 1.2 (the fitness of the parent clone 2) x 1.1 (the fitness of the new mut) The fitness of clone_id 5 is 1.452 = 1.32 (the fitness of the parent clone 4) x 1.1 (the fitness of the new mut)


For comparison with later, this is the raw_fitness_array, used to calculate the overall fitness With multi_gene_array=False it is not very interesting, just one column

print(s.raw_fitness_array)
[[1.   ],
 [1.1  ],
 [1.2  ],
 [1.3  ],
 [1.32 ],
 [1.452],
 [1.452]])

Mutation combination options

There are various other options as well as fitness multiplication

from clone_competition_simulation import MutationCombination

print(MutationCombination._member_names_)

['MULTIPLY', 'ADD', 'REPLACE', 'MAX', 'MIN']

For example, to add fitness values


mut_gen = MutationGenerator(
    genes=[
        Gene(name='Gene1', mutation_distribution=FixedValue(1.1), synonymous_proportion=0)
    ], 
    combine_mutations='add'  # Using add. Could also use MutationCombination.ADD
)

# The simulation is otherwise the same as before
np.random.seed(1)
p = Parameters(
    algorithm='Moran',
    times=TimeParameters(max_time=10, division_rate=1),
    population=PopulationParameters(initial_size_array=np.ones(4)),
    fitness=FitnessParameters(
        fitness_array=np.linspace(1, 1.3, 4),  # Giving the initial clones some different fitness values
        mutation_rates=0.1, 
        mutation_generator=mut_gen,     
    )
)
s = p.get_simulator()
s.run_sim()

print(s.view_clone_info())
clone id label fitness generation born parent clone id last gene mutated
0 0 0 1.0 0 -1 None
1 1 0 1.1 0 -1 None
2 2 0 1.2 0 -1 None
3 3 0 1.3 0 -1 None
4 4 0 1.3 22 2 Gene1
5 5 0 1.4 32 4 Gene1
6 6 0 1.4 39 4 Gene1

Note this adds the fitness advantage from the new mutation.
So if the fitness is 1.1, there is a 0.1 advantage over neutral.
The fitness of clone_id 4 is now 1.3 = 1.2 + 0.1

multi_gene_array=True

This allows much more control on the combination of fitness, but is also more complicated.

The mutations from each gene are treated separately, and then combined after.

Mutations within each gene are combined based on MutationGenerator.combine_mutations.
Fitness from all genes is then combined based on MutationGenerator.combine_array.


This is the same simulation we ran before with multi_gene_array=False.
Running a simulation where all new mutations have a fitness of 1.1. These multiply the previous fitness of the cell


mut_gen = MutationGenerator(
    genes=[
        Gene(name='Gene1', mutation_distribution=FixedValue(1.1), synonymous_proportion=0)
    ], 
    multi_gene_array=True,
    combine_mutations='multiply', # This is the default option
    combine_array='multiply',  # This is also the default
)

np.random.seed(1)
p = Parameters(
    algorithm='Moran', 
    times=TimeParameters(max_time=10, division_rate=1),
    population=PopulationParameters(initial_size_array=np.ones(4)),
    fitness=FitnessParameters(
        fitness_array=np.linspace(1, 1.3, 4),  # Giving the initial clones some different fitness values
        mutation_rates=0.1, 
        mutation_generator=mut_gen,     
    )
)
s = p.get_simulator()
s.run_sim()

print(s.view_clone_info())
clone id label fitness generation born parent clone id last gene mutated
0 0 0 1.000 0 -1 None
1 1 0 1.100 0 -1 None
2 2 0 1.200 0 -1 None
3 3 0 1.300 0 -1 None
4 4 0 1.320 22 2 Gene1
5 5 0 1.452 32 4 Gene1
6 6 0 1.452 39 4 Gene1

The clone info results are the same as before


But compare the raw_fitness_array to earlier.

With multi_gene_array=True, this is now a multidimensional array.

Each row is for a clone.
The first column is a fitness from the initial clones (often wild type/neutral fitness).
The next set of columns contain the fitness from each gene.
The total fitness for each clone is calculated from all the values in each row.
Here is it multiplied because MutationGenerator.combine_array='multiply' (any nan values are ignored).

print(s.raw_fitness_array)
[[1.  ,  nan],
 [1.1 ,  nan],
 [1.2 ,  nan],
 [1.3 ,  nan],
 [1.2 , 1.1 ],
 [1.2 , 1.21],
 [1.2 , 1.21]])

To see this along with the clone data, can use the include_raw_fitness argument

print(s.view_clone_info(include_raw_fitness=True))
clone id label fitness generation born parent clone id last gene mutated Initial clone fitness Gene1
0 0 0 1.000 0 -1 None 1.0 NaN
1 1 0 1.100 0 -1 None 1.1 NaN
2 2 0 1.200 0 -1 None 1.2 NaN
3 3 0 1.300 0 -1 None 1.3 NaN
4 4 0 1.320 22 2 Gene1 1.2 1.10
5 5 0 1.452 32 4 Gene1 1.2 1.21
6 6 0 1.452 39 4 Gene1 1.2 1.21

Using multiple genes adds extra columns to the raw_fitness_array.

You can combine effects between genes in a different way to combining mutations within a gene.
For example, if further mutations in the same gene have no further effect, we can use combine_mutations='max', 'replace', or 'min'.

But then you can still add the fitness effects from different genes.

mut_gen = MutationGenerator(
    genes=[
        Gene(name='Gene1', mutation_distribution=FixedValue(1.1), synonymous_proportion=0), 
        Gene(name='Gene2', mutation_distribution=FixedValue(1.05), synonymous_proportion=0)
    ], 
    multi_gene_array=True,
    combine_mutations='replace',   # With FixedValue this means further mutations will do nothing
    combine_array='add',  # Add the effects from the two genes
)

np.random.seed(0)
p = Parameters(
    algorithm='Moran',
    times=TimeParameters(max_time=10, division_rate=1),
    population=PopulationParameters(initial_cells=6),
    fitness=FitnessParameters(
        mutation_rates=0.15, 
        mutation_generator=mut_gen,     
    )
)
s = p.get_simulator()
s.run_sim()

print(s.view_clone_info(include_raw_fitness=True))
clone id label fitness generation born parent clone id last gene mutated Initial clone fitness Gene1 Gene2
0 0 0 1.00 0 -1 None 1.0 NaN NaN
1 1 0 1.10 8 0 Gene1 1.0 1.1 NaN
2 2 0 1.10 13 0 Gene1 1.0 1.1 NaN
3 3 0 1.05 18 0 Gene2 1.0 NaN 1.05
4 4 0 1.15 25 2 Gene2 1.0 1.1 1.05
5 5 0 1.15 35 2 Gene2 1.0 1.1 1.05
6 6 0 1.10 48 0 Gene1 1.0 1.1 NaN

You can see that the overall fitness of each clone is the sum of fitness advantage from the last three columns.


The options for combination across genes are similar to the options within genes.
priority is an odd case, where only the last mutated gene in the list (or the last epistatic effect) is used.
It might be relevant for cases where the knockout of one gene will override the function of downstream genes.

from clone_competition_simulation import ArrayCombination
print(ArrayCombination._member_names_)
['MULTIPLY', 'ADD', 'MAX', 'MIN', 'PRIORITY']

Epistatic effects

The combination of mutations in different genes may be more complex than simply addition or multiplication.
Defining epistatic effects allows more control of the combinations of fitness.

For example, this can be used for haplosufficient or haploinsufficient genes by defining a separate Gene object for each allele and an epistatic effect for when both alleles are mutated.

from clone_competition_simulation import EpistaticEffect

mut_gen = MutationGenerator(
    genes=[
        Gene(name='Gene1', mutation_distribution=FixedValue(1.1), synonymous_proportion=0), 
        Gene(name='Gene2', mutation_distribution=FixedValue(1.05), synonymous_proportion=0)
    ],                 
    # Add an epistatic effect. 
    # If both Gene1 and Gene2 are mutated, the clone has a fitness of 3
    # This replaces the individual effects of both genes
    epistatics=[
        EpistaticEffect(
            name='Epi1', 
            gene_names=['Gene1', 'Gene2'], 
            fitness_distribution=FixedValue(3)
        )
    ],
    multi_gene_array=True,
    combine_mutations='replace',   # With FixedValue this means further mutations will do nothing
)

np.random.seed(0)
p = Parameters(
    algorithm='Moran',
    times=TimeParameters(max_time=10, division_rate=1),
    population=PopulationParameters(initial_cells=6),
    fitness=FitnessParameters(
        mutation_rates=0.15, 
        mutation_generator=mut_gen,     
    )
)
s = p.get_simulator()
s.run_sim()

print(s.view_clone_info(include_raw_fitness=True))
clone id label fitness generation born parent clone id last gene mutated Initial clone fitness Gene1 Gene2 Epi1
0 0 0 1.00 0 -1 None 1.0 NaN NaN NaN
1 1 0 1.10 8 0 Gene1 1.0 1.1 NaN NaN
2 2 0 1.10 13 0 Gene1 1.0 1.1 NaN NaN
3 3 0 1.05 18 0 Gene2 1.0 NaN 1.05 NaN
4 4 0 3.00 25 2 Gene2 1.0 1.1 1.05 3.0
5 5 0 3.00 35 4 Gene2 1.0 1.1 1.05 3.0
6 6 0 3.00 48 4 Gene1 1.0 1.1 1.05 3.0

Note that for the last three clones, their fitness is 3 because both genes are mutated.

Checking fitness rules are correct

It can be a complex system for defining the rules for mutation combinations.
To check that the fitness values are what is intended, you can plot the (mean) fitness of the combination of gene mutations.
This assumes that there is at most a single mutation per gene.

Use this function from the MutationGenerator to plot the fitness of each combination of mutations. This is for the epistatic example above.

import matplotlib.pyplot as plt

fitness_combinations = mut_gen.plot_fitness_combinations()
print(f"Fitness combinations:  {fitness_combinations}")
plt.show()
Fitness combinations:  [1.   1.1  1.05 3.  ]

png


Add an extra gene and change some options and check again For any non-FixedValue distributions of fitness effects, the mean value is used.

from clone_competition_simulation import UniformDist

mut_gen2 = MutationGenerator(
    genes=[
        Gene(name='Gene1', mutation_distribution=FixedValue(1.1), synonymous_proportion=0), 
        Gene(name='Gene2', mutation_distribution=FixedValue(1.05), synonymous_proportion=0),
        Gene(name='Gene3', mutation_distribution=UniformDist(1, 2), synonymous_proportion=0)],                            
    epistatics=[
        EpistaticEffect(
            name='Epi1', 
            gene_names=['Gene1', 'Gene2'], 
            fitness_distribution=FixedValue(3)
        )
    ],
    multi_gene_array=True,
    combine_array='multiply'
)
fitness_combinations = mut_gen2.plot_fitness_combinations()
print(f"Fitness combinations:  {fitness_combinations}")
plt.show()
Fitness combinations:  [1.    1.1   1.05  3.    1.5   1.65  1.575 4.5  ]

png


Can see the difference using combine_array='max' instead of combine_array='multiply'

mut_gen2 = MutationGenerator(
    genes=[
        Gene(name='Gene1', mutation_distribution=FixedValue(1.1), synonymous_proportion=0), 
        Gene(name='Gene2', mutation_distribution=FixedValue(1.05), synonymous_proportion=0),
        Gene(name='Gene3', mutation_distribution=UniformDist(1, 2), synonymous_proportion=0)],                            
    epistatics=[
        EpistaticEffect(
            name='Epi1', 
            gene_names=['Gene1', 'Gene2'], 
            fitness_distribution=FixedValue(3)
        )
    ],
    multi_gene_array=True,
    combine_array='max'
)
fitness_combinations = mut_gen2.plot_fitness_combinations()
print(f"Fitness combinations:  {fitness_combinations}")
plt.show()
Fitness combinations:  [1.   1.1  1.05 3.   1.5  1.5  1.5  3.  ]

png

Assigning genes to the initial mutations

You can start a simulation with multiple clones and assign a gene to each of them.
This may be useful for lineage tracing style simulations in which a mutant is induced (defined at the start), and then subsequent mutations can appear on top of the initial mutations.


# Have similar epistatic rules as above
# But stop Gene1 from mutating at random by setting weight=0
mut_gen = MutationGenerator(
    genes=[
        Gene(name='Gene1', mutation_distribution=FixedValue(1.1), synonymous_proportion=0, weight=0), 
        Gene(name='Gene2', mutation_distribution=FixedValue(1.05), synonymous_proportion=0)
    ],
    epistatics=[
        EpistaticEffect(
            name='Epi1', 
            gene_names=['Gene1', 'Gene2'], 
            fitness_distribution=FixedValue(2)
        )
    ],
    multi_gene_array=True,
    combine_mutations='replace',   # With FixedValue this means further mutations will do nothing
)

# This time, start with some mutations in Gene1. 
# Then mutate Gene2, which has the epistatic interaction with Gene1
np.random.seed(1)
p = Parameters(
    algorithm='Moran', 
    times=TimeParameters(max_time=10, division_rate=1),
    population=PopulationParameters(
        initial_size_array=[6, 1, 1, 1, 1],   # Set up 4 1-cell clones and 6 WT cells
    ),
    fitness=FitnessParameters(
        mutation_rates=0.08, 
        mutation_generator=mut_gen,     

        # The initial_mutant_gene_array defines the genes to be associated with the gene
        # The value is the index of the Gene in MutationGenerator.genes
        # Use -1 for the clones with no mutation/associated gene
        initial_mutant_gene_array=[-1, 0, 0, 0, 0],  

        # Also need to define the fitness array or all initial clones will have a fitness=1
        # These fitness values do not have to equal the fitness values from the Gene assigned. 
        fitness_array=[1, 1.1, 1.1, 1.1, 1.1],
    )
)
s = p.get_simulator()
s.run_sim()

print(s.view_clone_info(include_raw_fitness=True))
clone id label fitness generation born parent clone id last gene mutated Initial clone fitness Gene1 Gene2 Epi1
0 0 0 1.00 0 -1 None 1.0 NaN NaN NaN
1 1 0 1.10 0 -1 Gene1 1.0 1.1 NaN NaN
2 2 0 1.10 0 -1 Gene1 1.0 1.1 NaN NaN
3 3 0 1.10 0 -1 Gene1 1.0 1.1 NaN NaN
4 4 0 1.10 0 -1 Gene1 1.0 1.1 NaN NaN
5 5 0 1.05 22 0 Gene2 1.0 NaN 1.05 NaN
6 6 0 2.00 32 4 Gene2 1.0 1.1 1.05 2.0
7 7 0 2.00 39 3 Gene2 1.0 1.1 1.05 2.0
8 8 0 2.00 66 6 Gene2 1.0 1.1 1.05 2.0
9 9 0 2.00 73 8 Gene2 1.0 1.1 1.05 2.0
10 10 0 2.00 83 8 Gene2 1.0 1.1 1.05 2.0
11 11 0 2.00 86 8 Gene2 1.0 1.1 1.05 2.0

The initial clones (generation_born=0) now can have epistatic interactions.

Diminishing returns

As a clone gains more mutations and becomes fitter, there may be less and less room for improvement.
You can apply a transformation of the combined fitness effects of all mutations to introduce diminishing returns (or other function).


# Run a simulation with loads of mutations that multiply in fitness
mut_gen = MutationGenerator(
    genes=[Gene(name='Gene1', mutation_distribution=FixedValue(1.4), synonymous_proportion=0)], 
    combine_mutations='multiply' 
)

np.random.seed(0)
p = Parameters(
    algorithm='Moran',
    times=TimeParameters(max_time=10, division_rate=1),
    population=PopulationParameters(
        initial_cells=1000
    ),
    fitness=FitnessParameters(
        mutation_rates=0.1, 
        mutation_generator=mut_gen
    )
)
s = p.get_simulator()
s.run_sim()

print(s.view_clone_info()[-10:])
clone id label fitness generation born parent clone id last gene mutated
1000 1000 0 2295.856929 99 985 Gene1
1001 1001 0 426.878854 99 806 Gene1
1002 1002 0 304.913467 100 860 Gene1
1003 1003 0 1639.897806 100 930 Gene1
1004 1004 0 304.913467 100 841 Gene1
1005 1005 0 1639.897806 100 930 Gene1
1006 1006 0 1171.355576 100 886 Gene1
1007 1007 0 1171.355576 100 891 Gene1
1008 1008 0 597.630396 100 863 Gene1
1009 1009 0 2295.856929 100 985 Gene1

You can see that the fitness values of the clones has become very high!


The final clones have many mutations, each multiplying the fitness by 1.4.
This compound interest of mutations can lead to very high fitness

print(s.get_clone_ancestors(1009))
[1009,985,984,886,848,847,791,790,786,735,620,483,470,464,405,390,346,340,339,319,198,116,115,0,-1]

One (extreme) option to limit to growth of fitness values is to use the 'max' option - for this simulation, this would limit the fitness for clones at 1.4.

However, there is another option that enables diminishing returns to be applied.


The BoundedLogisticFitness class places an upper bound on fitness. Here fitness cannot get above 3.

from clone_competition_simulation import BoundedLogisticFitness

b = BoundedLogisticFitness(3)
plt.plot(np.linspace(0, 10, 100), b.fitness(np.linspace(0, 10, 100)))
plt.xlabel('Raw fitness')
plt.ylabel('Transformed fitness')
plt.show()

png


The shape of the fitness transformation can also be altered.


b = BoundedLogisticFitness(3)
plt.plot(np.linspace(0, 10, 100), b.fitness(np.linspace(0, 10, 100)))
b2 = BoundedLogisticFitness(3, 10)
plt.plot(np.linspace(0, 10, 100), b2.fitness(np.linspace(0, 10, 100)))
plt.xlabel('Raw fitness')
plt.ylabel('Transformed fitness')
plt.show()

png


Running a simulation again, but limiting the max fitness.

mut_gen = MutationGenerator(
    genes=[Gene(name='Gene1', mutation_distribution=FixedValue(1.4), synonymous_proportion=0)], 
    combine_mutations='multiply', 
    mutation_combination_class=BoundedLogisticFitness(3)   # This limits fitness to 3
)

np.random.seed(0)
p = Parameters(
    algorithm='Moran',
    times=TimeParameters(max_time=10, division_rate=1),
    population=PopulationParameters(
        initial_cells=1000
    ),
    fitness=FitnessParameters(
        mutation_rates=0.1, 
        mutation_generator=mut_gen
    )
)
s = p.get_simulator()
s.run_sim()

print(s.view_clone_info(include_raw_fitness=True)[-10:])
clone id label fitness generation born parent clone id last gene mutated Gene1
1000 1000 0 2.926558 99 964 Gene1 5.378240
1001 1001 0 2.926558 99 357 Gene1 5.378240
1002 1002 0 2.999569 100 464 Gene1 10.541350
1003 1003 0 2.926558 100 716 Gene1 5.378240
1004 1004 0 2.926558 100 360 Gene1 5.378240
1005 1005 0 2.926558 100 691 Gene1 5.378240
1006 1006 0 2.222816 100 586 Gene1 2.744000
1007 1007 0 2.999994 100 611 Gene1 14.757891
1008 1008 0 2.991267 100 454 Gene1 7.529536
1009 1009 0 2.999569 100 953 Gene1 10.541350

Now those final clones have their fitness capped.
The raw fitness is also lower, since the fitness reduction also reduces the advantage of the fittest clones (which through expansion would gain more subclones and become even fitter)