Mutations (Part 3)

This guide describes some of the more complex options for defining the fitness effects of mutations. Most of the complexity comes from defining how the fitness of a cell with multiple mutations should be calculated.

multi_gene_array=False

This is the default and simplest option.
Each new mutation in a cell combines with the previous mutations without any regard for the gene it came from.

Run a simulation where all new mutations have a fitness of 1.1 These multiply the previous fitness of the cell

import numpy as np
from clone_competition_simulation import (
    Parameters, 
    TimeParameters, 
    PopulationParameters, 
    FitnessParameters,
    Gene, 
    MutationGenerator, 
    FixedValue 
)

mut_gen = MutationGenerator(
    genes=[Gene(name='Gene1', mutation_distribution=FixedValue(1.1), synonymous_proportion=0)], 
    combine_mutations='multiply'  # This is also the default option
)

np.random.seed(1)
p = Parameters(
    algorithm='Moran', 
    times=TimeParameters(max_time=10, division_rate=1),
    population=PopulationParameters(initial_size_array=np.ones(4)),
    fitness=FitnessParameters(
        fitness_array=np.linspace(1, 1.3, 4),  # Giving the initial clones some different fitness values
        mutation_rates=0.1, 
        mutation_generator=mut_gen,     
    )
)
s = p.get_simulator()
s.run_sim()

print(s.view_clone_info())

	clone id	fitness	generation born	parent clone id	last gene mutated
0	0	1.000	0	-1	None
1	1	1.100	0	-1	None
2	2	1.200	0	-1	None
3	3	1.300	0	-1	None
4	4	1.320	22	2	Gene1
5	5	1.452	32	4	Gene1
6	6	1.452	39	4	Gene1

The fitness of clone_id 4 is 1.32 = 1.2 (the fitness of the parent clone 2) x 1.1 (the fitness of the new mut) The fitness of clone_id 5 is 1.452 = 1.32 (the fitness of the parent clone 4) x 1.1 (the fitness of the new mut)

For comparison with later, this is the raw_fitness_array, used to calculate the overall fitness With multi_gene_array=False it is not very interesting, just one column

print(s.raw_fitness_array)

[[1.   ],
 [1.1  ],
 [1.2  ],
 [1.3  ],
 [1.32 ],
 [1.452],
 [1.452]])

Mutation combination options

There are various other options as well as fitness multiplication

from clone_competition_simulation import MutationCombination

print(MutationCombination._member_names_)

['MULTIPLY', 'ADD', 'REPLACE', 'MAX', 'MIN']

For example, to add fitness values


mut_gen = MutationGenerator(
    genes=[
        Gene(name='Gene1', mutation_distribution=FixedValue(1.1), synonymous_proportion=0)
    ], 
    combine_mutations='add'  # Using add. Could also use MutationCombination.ADD
)

# The simulation is otherwise the same as before
np.random.seed(1)
p = Parameters(
    algorithm='Moran',
    times=TimeParameters(max_time=10, division_rate=1),
    population=PopulationParameters(initial_size_array=np.ones(4)),
    fitness=FitnessParameters(
        fitness_array=np.linspace(1, 1.3, 4),  # Giving the initial clones some different fitness values
        mutation_rates=0.1, 
        mutation_generator=mut_gen,     
    )
)
s = p.get_simulator()
s.run_sim()

print(s.view_clone_info())

	clone id	fitness	generation born	parent clone id	last gene mutated
0	0	1.0	0	-1	None
1	1	1.1	0	-1	None
2	2	1.2	0	-1	None
3	3	1.3	0	-1	None
4	4	1.3	22	2	Gene1
5	5	1.4	32	4	Gene1
6	6	1.4	39	4	Gene1

Note this adds the fitness advantage from the new mutation.
So if the fitness is 1.1, there is a 0.1 advantage over neutral.
The fitness of clone_id 4 is now 1.3 = 1.2 + 0.1

multi_gene_array=True

This allows much more control on the combination of fitness, but is also more complicated.

The mutations from each gene are treated separately, and then combined after.

Mutations within each gene are combined based on MutationGenerator.combine_mutations.
Fitness from all genes is then combined based on MutationGenerator.combine_array.

This is the same simulation we ran before with multi_gene_array=False.
Running a simulation where all new mutations have a fitness of 1.1. These multiply the previous fitness of the cell


mut_gen = MutationGenerator(
    genes=[
        Gene(name='Gene1', mutation_distribution=FixedValue(1.1), synonymous_proportion=0)
    ], 
    multi_gene_array=True,
    combine_mutations='multiply', # This is the default option
    combine_array='multiply',  # This is also the default
)

np.random.seed(1)
p = Parameters(
    algorithm='Moran', 
    times=TimeParameters(max_time=10, division_rate=1),
    population=PopulationParameters(initial_size_array=np.ones(4)),
    fitness=FitnessParameters(
        fitness_array=np.linspace(1, 1.3, 4),  # Giving the initial clones some different fitness values
        mutation_rates=0.1, 
        mutation_generator=mut_gen,     
    )
)
s = p.get_simulator()
s.run_sim()

print(s.view_clone_info())

	clone id	fitness	generation born	parent clone id	last gene mutated
0	0	1.000	0	-1	None
1	1	1.100	0	-1	None
2	2	1.200	0	-1	None
3	3	1.300	0	-1	None
4	4	1.320	22	2	Gene1
5	5	1.452	32	4	Gene1
6	6	1.452	39	4	Gene1

The clone info results are the same as before

But compare the raw_fitness_array to earlier.

With multi_gene_array=True, this is now a multidimensional array.

Each row is for a clone.
The first column is a fitness from the initial clones (often wild type/neutral fitness).
The next set of columns contain the fitness from each gene.
The total fitness for each clone is calculated from all the values in each row.
Here is it multiplied because MutationGenerator.combine_array='multiply' (any nan values are ignored).

print(s.raw_fitness_array)

[[1.  ,  nan],
 [1.1 ,  nan],
 [1.2 ,  nan],
 [1.3 ,  nan],
 [1.2 , 1.1 ],
 [1.2 , 1.21],
 [1.2 , 1.21]])

To see this along with the clone data, can use the include_raw_fitness argument

print(s.view_clone_info(include_raw_fitness=True))

	clone id	fitness	generation born	parent clone id	last gene mutated	Initial clone fitness	Gene1
0	0	1.000	0	-1	None	1.0	NaN
1	1	1.100	0	-1	None	1.1	NaN
2	2	1.200	0	-1	None	1.2	NaN
3	3	1.300	0	-1	None	1.3	NaN
4	4	1.320	22	2	Gene1	1.2	1.10
5	5	1.452	32	4	Gene1	1.2	1.21
6	6	1.452	39	4	Gene1	1.2	1.21

Using multiple genes adds extra columns to the raw_fitness_array.

You can combine effects between genes in a different way to combining mutations within a gene.
For example, if further mutations in the same gene have no further effect, we can use combine_mutations='max', 'replace', or 'min'.

But then you can still add the fitness effects from different genes.

mut_gen = MutationGenerator(
    genes=[
        Gene(name='Gene1', mutation_distribution=FixedValue(1.1), synonymous_proportion=0), 
        Gene(name='Gene2', mutation_distribution=FixedValue(1.05), synonymous_proportion=0)
    ], 
    multi_gene_array=True,
    combine_mutations='replace',   # With FixedValue this means further mutations will do nothing
    combine_array='add',  # Add the effects from the two genes
)

np.random.seed(0)
p = Parameters(
    algorithm='Moran',
    times=TimeParameters(max_time=10, division_rate=1),
    population=PopulationParameters(initial_cells=6),
    fitness=FitnessParameters(
        mutation_rates=0.15, 
        mutation_generator=mut_gen,     
    )
)
s = p.get_simulator()
s.run_sim()

print(s.view_clone_info(include_raw_fitness=True))

	clone id	fitness	generation born	parent clone id	last gene mutated	Initial clone fitness	Gene1	Gene2
0	0	1.00	0	-1	None	1.0	NaN	NaN
1	1	1.10	8	0	Gene1	1.0	1.1	NaN
2	2	1.10	13	0	Gene1	1.0	1.1	NaN
3	3	1.05	18	0	Gene2	1.0	NaN	1.05
4	4	1.15	25	2	Gene2	1.0	1.1	1.05
5	5	1.15	35	2	Gene2	1.0	1.1	1.05
6	6	1.10	48	0	Gene1	1.0	1.1	NaN

You can see that the overall fitness of each clone is the sum of fitness advantage from the last three columns.

The options for combination across genes are similar to the options within genes.
priority is an odd case, where only the last mutated gene in the list (or the last epistatic effect) is used.
It might be relevant for cases where the knockout of one gene will override the function of downstream genes.

from clone_competition_simulation import ArrayCombination
print(ArrayCombination._member_names_)

['MULTIPLY', 'ADD', 'MAX', 'MIN', 'PRIORITY']

Epistatic effects

The combination of mutations in different genes may be more complex than simply addition or multiplication.
Defining epistatic effects allows more control of the combinations of fitness.

For example, this can be used for haplosufficient or haploinsufficient genes by defining a separate Gene object for each allele and an epistatic effect for when both alleles are mutated.

from clone_competition_simulation import EpistaticEffect

mut_gen = MutationGenerator(
    genes=[
        Gene(name='Gene1', mutation_distribution=FixedValue(1.1), synonymous_proportion=0), 
        Gene(name='Gene2', mutation_distribution=FixedValue(1.05), synonymous_proportion=0)
    ],                 
    # Add an epistatic effect. 
    # If both Gene1 and Gene2 are mutated, the clone has a fitness of 3
    # This replaces the individual effects of both genes
    epistatics=[
        EpistaticEffect(
            name='Epi1', 
            gene_names=['Gene1', 'Gene2'], 
            fitness_distribution=FixedValue(3)
        )
    ],
    multi_gene_array=True,
    combine_mutations='replace',   # With FixedValue this means further mutations will do nothing
)

np.random.seed(0)
p = Parameters(
    algorithm='Moran',
    times=TimeParameters(max_time=10, division_rate=1),
    population=PopulationParameters(initial_cells=6),
    fitness=FitnessParameters(
        mutation_rates=0.15, 
        mutation_generator=mut_gen,     
    )
)
s = p.get_simulator()
s.run_sim()

print(s.view_clone_info(include_raw_fitness=True))

	clone id	fitness	generation born	parent clone id	last gene mutated	Initial clone fitness	Gene1	Gene2	Epi1
0	0	1.00	0	-1	None	1.0	NaN	NaN	NaN
1	1	1.10	8	0	Gene1	1.0	1.1	NaN	NaN
2	2	1.10	13	0	Gene1	1.0	1.1	NaN	NaN
3	3	1.05	18	0	Gene2	1.0	NaN	1.05	NaN
4	4	3.00	25	2	Gene2	1.0	1.1	1.05	3.0
5	5	3.00	35	4	Gene2	1.0	1.1	1.05	3.0
6	6	3.00	48	4	Gene1	1.0	1.1	1.05	3.0

Note that for the last three clones, their fitness is 3 because both genes are mutated.

Checking fitness rules are correct

It can be a complex system for defining the rules for mutation combinations.
To check that the fitness values are what is intended, you can plot the (mean) fitness of the combination of gene mutations.
This assumes that there is at most a single mutation per gene.

Use this function from the MutationGenerator to plot the fitness of each combination of mutations. This is for the epistatic example above.

import matplotlib.pyplot as plt

fitness_combinations = mut_gen.plot_fitness_combinations()
print(f"Fitness combinations:  {fitness_combinations}")
plt.show()

Fitness combinations:  [1.   1.1  1.05 3.  ]

png

Add an extra gene and change some options and check again For any non-FixedValue distributions of fitness effects, the mean value is used.

from clone_competition_simulation import UniformDist

mut_gen2 = MutationGenerator(
    genes=[
        Gene(name='Gene1', mutation_distribution=FixedValue(1.1), synonymous_proportion=0), 
        Gene(name='Gene2', mutation_distribution=FixedValue(1.05), synonymous_proportion=0),
        Gene(name='Gene3', mutation_distribution=UniformDist(1, 2), synonymous_proportion=0)],                            
    epistatics=[
        EpistaticEffect(
            name='Epi1', 
            gene_names=['Gene1', 'Gene2'], 
            fitness_distribution=FixedValue(3)
        )
    ],
    multi_gene_array=True,
    combine_array='multiply'
)
fitness_combinations = mut_gen2.plot_fitness_combinations()
print(f"Fitness combinations:  {fitness_combinations}")
plt.show()

Fitness combinations:  [1.    1.1   1.05  3.    1.5   1.65  1.575 4.5  ]

png

Can see the difference using combine_array='max' instead of combine_array='multiply'

mut_gen2 = MutationGenerator(
    genes=[
        Gene(name='Gene1', mutation_distribution=FixedValue(1.1), synonymous_proportion=0), 
        Gene(name='Gene2', mutation_distribution=FixedValue(1.05), synonymous_proportion=0),
        Gene(name='Gene3', mutation_distribution=UniformDist(1, 2), synonymous_proportion=0)],                            
    epistatics=[
        EpistaticEffect(
            name='Epi1', 
            gene_names=['Gene1', 'Gene2'], 
            fitness_distribution=FixedValue(3)
        )
    ],
    multi_gene_array=True,
    combine_array='max'
)
fitness_combinations = mut_gen2.plot_fitness_combinations()
print(f"Fitness combinations:  {fitness_combinations}")
plt.show()

Fitness combinations:  [1.   1.1  1.05 3.   1.5  1.5  1.5  3.  ]

png

Assigning genes to the initial mutations

You can start a simulation with multiple clones and assign a gene to each of them.
This may be useful for lineage tracing style simulations in which a mutant is induced (defined at the start), and then subsequent mutations can appear on top of the initial mutations.

# Have similar epistatic rules as above
# But stop Gene1 from mutating at random by setting weight=0
mut_gen = MutationGenerator(
    genes=[
        Gene(name='Gene1', mutation_distribution=FixedValue(1.1), synonymous_proportion=0, weight=0), 
        Gene(name='Gene2', mutation_distribution=FixedValue(1.05), synonymous_proportion=0)
    ],
    epistatics=[
        EpistaticEffect(
            name='Epi1', 
            gene_names=['Gene1', 'Gene2'], 
            fitness_distribution=FixedValue(2)
        )
    ],
    multi_gene_array=True,
    combine_mutations='replace',   # With FixedValue this means further mutations will do nothing
)

# This time, start with some mutations in Gene1. 
# Then mutate Gene2, which has the epistatic interaction with Gene1
np.random.seed(1)
p = Parameters(
    algorithm='Moran', 
    times=TimeParameters(max_time=10, division_rate=1),
    population=PopulationParameters(
        initial_size_array=[6, 1, 1, 1, 1],   # Set up 4 1-cell clones and 6 WT cells
    ),
    fitness=FitnessParameters(
        mutation_rates=0.08, 
        mutation_generator=mut_gen,     

        # The initial_mutant_gene_array defines the genes to be associated with the gene
        # The value is the index of the Gene in MutationGenerator.genes
        # Use -1 for the clones with no mutation/associated gene
        initial_mutant_gene_array=[-1, 0, 0, 0, 0],  

        # Also need to define the fitness array or all initial clones will have a fitness=1
        # These fitness values do not have to equal the fitness values from the Gene assigned. 
        fitness_array=[1, 1.1, 1.1, 1.1, 1.1],
    )
)
s = p.get_simulator()
s.run_sim()

print(s.view_clone_info(include_raw_fitness=True))

	clone id	fitness	generation born	parent clone id	last gene mutated	Initial clone fitness	Gene1	Gene2	Epi1
0	0	1.00	0	-1	None	1.0	NaN	NaN	NaN
1	1	1.10	0	-1	Gene1	1.0	1.1	NaN	NaN
2	2	1.10	0	-1	Gene1	1.0	1.1	NaN	NaN
3	3	1.10	0	-1	Gene1	1.0	1.1	NaN	NaN
4	4	1.10	0	-1	Gene1	1.0	1.1	NaN	NaN
5	5	1.05	22	0	Gene2	1.0	NaN	1.05	NaN
6	6	2.00	32	4	Gene2	1.0	1.1	1.05	2.0
7	7	2.00	39	3	Gene2	1.0	1.1	1.05	2.0
8	8	2.00	66	6	Gene2	1.0	1.1	1.05	2.0
9	9	2.00	73	8	Gene2	1.0	1.1	1.05	2.0
10	10	2.00	83	8	Gene2	1.0	1.1	1.05	2.0
11	11	2.00	86	8	Gene2	1.0	1.1	1.05	2.0

The initial clones (generation_born=0) now can have epistatic interactions.

Diminishing returns

As a clone gains more mutations and becomes fitter, there may be less and less room for improvement.
You can apply a transformation of the combined fitness effects of all mutations to introduce diminishing returns (or other function).

# Run a simulation with loads of mutations that multiply in fitness
mut_gen = MutationGenerator(
    genes=[Gene(name='Gene1', mutation_distribution=FixedValue(1.4), synonymous_proportion=0)], 
    combine_mutations='multiply' 
)

np.random.seed(0)
p = Parameters(
    algorithm='Moran',
    times=TimeParameters(max_time=10, division_rate=1),
    population=PopulationParameters(
        initial_cells=1000
    ),
    fitness=FitnessParameters(
        mutation_rates=0.1, 
        mutation_generator=mut_gen
    )
)
s = p.get_simulator()
s.run_sim()

print(s.view_clone_info()[-10:])

	clone id	fitness	generation born	parent clone id	last gene mutated
1000	1000	2295.856929	99	985	Gene1
1001	1001	426.878854	99	806	Gene1
1002	1002	304.913467	100	860	Gene1
1003	1003	1639.897806	100	930	Gene1
1004	1004	304.913467	100	841	Gene1
1005	1005	1639.897806	100	930	Gene1
1006	1006	1171.355576	100	886	Gene1
1007	1007	1171.355576	100	891	Gene1
1008	1008	597.630396	100	863	Gene1
1009	1009	2295.856929	100	985	Gene1

You can see that the fitness values of the clones has become very high!

The final clones have many mutations, each multiplying the fitness by 1.4.
This compound interest of mutations can lead to very high fitness

print(s.get_clone_ancestors(1009))

[1009,985,984,886,848,847,791,790,786,735,620,483,470,464,405,390,346,340,339,319,198,116,115,0,-1]

One (extreme) option to limit to growth of fitness values is to use the 'max' option - for this simulation, this would limit the fitness for clones at 1.4.

However, there is another option that enables diminishing returns to be applied.

The BoundedLogisticFitness class places an upper bound on fitness. Here fitness cannot get above 3.

from clone_competition_simulation import BoundedLogisticFitness

b = BoundedLogisticFitness(3)
plt.plot(np.linspace(0, 10, 100), b.fitness(np.linspace(0, 10, 100)))
plt.xlabel('Raw fitness')
plt.ylabel('Transformed fitness')
plt.show()

png

The shape of the fitness transformation can also be altered.


b = BoundedLogisticFitness(3)
plt.plot(np.linspace(0, 10, 100), b.fitness(np.linspace(0, 10, 100)))
b2 = BoundedLogisticFitness(3, 10)
plt.plot(np.linspace(0, 10, 100), b2.fitness(np.linspace(0, 10, 100)))
plt.xlabel('Raw fitness')
plt.ylabel('Transformed fitness')
plt.show()

png

Running a simulation again, but limiting the max fitness.

mut_gen = MutationGenerator(
    genes=[Gene(name='Gene1', mutation_distribution=FixedValue(1.4), synonymous_proportion=0)], 
    combine_mutations='multiply', 
    mutation_combination_class=BoundedLogisticFitness(3)   # This limits fitness to 3
)

np.random.seed(0)
p = Parameters(
    algorithm='Moran',
    times=TimeParameters(max_time=10, division_rate=1),
    population=PopulationParameters(
        initial_cells=1000
    ),
    fitness=FitnessParameters(
        mutation_rates=0.1, 
        mutation_generator=mut_gen
    )
)
s = p.get_simulator()
s.run_sim()

print(s.view_clone_info(include_raw_fitness=True)[-10:])

	clone id	fitness	generation born	parent clone id	last gene mutated	Gene1
1000	1000	2.926558	99	964	Gene1	5.378240
1001	1001	2.926558	99	357	Gene1	5.378240
1002	1002	2.999569	100	464	Gene1	10.541350
1003	1003	2.926558	100	716	Gene1	5.378240
1004	1004	2.926558	100	360	Gene1	5.378240
1005	1005	2.926558	100	691	Gene1	5.378240
1006	1006	2.222816	100	586	Gene1	2.744000
1007	1007	2.999994	100	611	Gene1	14.757891
1008	1008	2.991267	100	454	Gene1	7.529536
1009	1009	2.999569	100	953	Gene1	10.541350

Now those final clones have their fitness capped.
The raw fitness is also lower, since the fitness reduction also reduces the advantage of the fittest clones (which through expansion would gain more subclones and become even fitter)