Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Algorithmic Trading: Comparing Implementation Shortfall Measures, Schemes and Mind Maps of Molecular biology

The implementation shortfall measure in algorithmic trading, which decomposes the difference between paper and real portfolio performance into execution cost and opportunity cost. The text uses Perold's framework and exhibits trade records from paper and real portfolios to illustrate the concept. The document also explains how to allocate trades in real portfolios to subperiods and evaluate the score using the Needleman-Wunsch algorithm.

Typology: Schemes and Mind Maps

2021/2022

Uploaded on 09/27/2022

bartolix
bartolix 🇬🇧

4.8

(17)

304 documents

1 / 22

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Computation of Implementation Shortfall for
Algorithmic Trading by Sequence Alignment
Raymond Chan, Kelvin Kan, and Alfred Ma
Raymond Chan is the Dean in College of Science and a Chair Professor in the Department
of Mathematics, City University of Hong Kong. Email: rchan.sci@cityu.edu.hk
Kelvin Kan is a Ph.D. student in the Department of Mathematics, Emory University.
Email: kelvin.kan@emory.edu
Alfred Ma is an Adjunct Professor in the Department of Economics and Finance, Hang
Seng University of Hong Kong. Email: alfredma@hsu.edu.hk
Raymond Chan’s research is supported by HKRGC Grants No. CityU12500915, CityU14306316,
HKRGC CRF Grant C1007-15G, and HKRGC AoE Grant AoE/M-05/12.
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16

Partial preview of the text

Download Algorithmic Trading: Comparing Implementation Shortfall Measures and more Schemes and Mind Maps Molecular biology in PDF only on Docsity!

Computation of Implementation Shortfall for

Algorithmic Trading by Sequence Alignment

Raymond Chan, Kelvin Kan, and Alfred Ma

Raymond Chan is the Dean in College of Science and a Chair Professor in the Department of Mathematics, City University of Hong Kong. Email: rchan.sci@cityu.edu.hk Kelvin Kan is a Ph.D. student in the Department of Mathematics, Emory University. Email: kelvin.kan@emory.edu Alfred Ma is an Adjunct Professor in the Department of Economics and Finance, Hang Seng University of Hong Kong. Email: alfredma@hsu.edu.hk

Raymond Chan’s research is supported by HKRGC Grants No. CityU12500915, CityU14306316, HKRGC CRF Grant C1007-15G, and HKRGC AoE Grant AoE/M-05/12.

Abstract Implementation shortfall measures the difference in performance between paper portfolio and real portfolio, and it is decomposed as a sum of execution cost and opportunity cost. The authors show that the original framework is not directly appli- cable to algorithmic trading and propose a new framework to compute implementation shortfall and its decomposition. They employ an efficient algorithm inspired by DNA sequence alignment techniques to align the trade records from both portfolios and then compute the implementation shortfall with a breakdown of execution cost and oppor- tunity cost for diagnosis. Their framework is simple, objective, and computationally efficient—the complexity only grows linearly with respect to the numbers of trades of paper and real portfolios. Hence the framework proposed by the authors in this article is applicable to high frequency trading data. Keywords: Implementation Shortfall, Algorithmic Trading, Algorithmic Trading System, Backtesting, Sequence Alignment

paper portfolio is (10. 20 − 10 .12) × 200 = 16 while the profit of the real portfolio is (10. 20 − 10 .13) × 100 + (10. 20 − 10 .14) × 50 = 10. The difference (16 − 10 = 6) is the implementation shortfall which is also the sum of the execution cost and opportunity cost (2 + 4 = 6). We

Exhibit 2: Paper and real portfolio from algorithmic trading

then consider algorithmic trading as shown in Exhibit 2. In practice, the paper portfolio is found by a backtest, see Bailey et al. (2016) for example. In general, backtesting and trading are implemented in different systems. The paper portfolio is considered the ideal result while real portfolio shows the reality. Exhibit 3 lists algorithmic trade records from the paper portfolio and the real portfolio of a particular financial instrument. According to the Perold’s framework or other extensions such as Kissell (2006), the implementation shortfall is measured over periods of no trading in the paper portfolio. While they both do not address how the framework can be applied in our example, it is standard to divide the whole period into subperiods with no trading in the paper portfolio and apply the framework over each subperiod. In the example, the first subperiod has to be chosen strictly before the time stamp of the second trade, namely,

Exhibit 3: An example of paper and real portfolios

Paper Portfolio Real Portfolio Time stamp Price Volume Type Time stamp Price Volume Type 10:23:03.332 10.12 200 Buy 10:23:03.676 10.13 200 Buy 10:23:03.443 10.13 100 Buy 10:23:03.711 10.13 100 Buy 10:23:07.121 10.14 100 Sell 10:23:10.144 10.14 200 Sell 10:23:09.574 10.15 200 Sell

10:23:03.443. The first trade from the real portfolio does not lie in this subperiod and therefore the implementation shortfall in the first subperiod have only opportunity cost. However, it is more reasonable to conclude that the first two trades in the paper and real portfolio results in execution cost but not opportunity cost. We can overcome this difficulty by allocating trades in real portfolio to subperiods without only considering the time stamps. However, this flexibility comes with subjectivity and the results vary among different people. In addition, when the number of trades increases, this subjective task is deemed infeasible in practice. In essence, the problem resembles the DNA sequence alignment problem in bioinformat- ics. DNA sequence alignment has always been an active research topic in bioinformatics. The pioneering work of Needleman and Wunsch (1970) introduces a simple yet computa- tionally efficient algorithm to globally align two DNA sequences. They propose a dynamic programming approach to avoid repeated calculation in the alignment of subsequences. Since then, many heuristics and variations of Needleman-Wunsch algorithm (NW algorithm) are introduced. The space complexity of NW algorithm is improved by Hirschberg (1975). Smith and Waterman (1981) introduce an algorithm for local sequence alignment algorithm. The time and space complexity of their algorithm are optimized by Gotoh (1982) and Myers and Miller (1988) respectively. Altschul and Erickson (1986) further improve the algorithm of Gotoh (1982) by using affine gap costs instead of proportional to the length of a gap. While the methods above focused on improving the complexity of the algorithm, some

and qj are mismatched if pi 6 = qj ), ∆ denotes a missing DNA, piΘ∆ denotes pi being aligned with a missing DNA, γ is a parameter penalizing the alignment with a missing DNA, and s is a function evaluating the score of aligning two DNAs. Since we take the maximum over the scores of the cases in the update of Ai,j , Ai,j stores the score of the best alignment of p1:i with q1:j , where pi 1 :i 2 denotes the subsequence of p with consecutive DNAs from pi 1 to pi 2 , and AM,N stores the score of the best alignment of p with q. After the computation of the score matrix A is completed, we trace back from AM,N to A 0 , 0 to construct the best alignment of p with q. For more details of the NW algorithm, see Needleman and Wunsch (1970).

Algorithm 1 Needleman-Wunsch algorithm 1: Initialize: M = length(p), N = length(q). A 0 , 0 = 0. Ai, 0 = Ai− 1 , 0 + γ, ∀i ∈ 1 , ..., M. A 0 ,j = A 0 ,j− 1 + γ, ∀j ∈ 1 , ..., N. 2: for i = 1 : M do 3: for j = 1 : N do 4:

Ai,j = max

Ai− 1 ,j− 1 + s(pi, qj ), Case: piΘqj or piΦqj , (1a) Ai− 1 ,j + γ, Case: piΘ∆, (1b) Ai,j− 1 + γ, Case: ∆Θqj. (1c) 5: end for 6: end for 7: Trace back from AM,N to A 0 , 0 to construct the best alignment of p with q.

Stage 1: Trading Sequences Alignment

In this section, we describe the alignment algorithm for aligning records from the real port- folio with records from the paper portfolio. For simplicity, we consider one financial asset^1.

  1. in some cases, it is helpful to align trades of different assets. The modification is straightforward.

Description and Notation of Trading Sequences

Existing literature has investigated backtesting algorithm on different types of data. For example, L¨ow, Maier-Paape, and Platen (2015) and Maier-Paape and Platen (2016) investi- gate backtesting on candle historical data, Hurlin, Colletaz, and Tokpavi (2007) and Dionne, Duchesne, and Pacurar (2009) investigate backtesting on tick-by-tick data, Wang, Rostoker, and Wagner (2009) investigates backtesting on bid-ask tick data. Regardless of the backtest system chosen and the type of data, the trade records of a paper portfolio can be represented by b with bi denoting the i-th trade record. Each trade record bi is a vector of the form:

bi = [tbi , pbi , vbi , yib ], (2)

where tbi is the time when the trade is executed, pbi is the trading price, vbi is the trading volume and yib is the type of the trade, i.e. buy or sell. Similarly, we use l to represent the trade records of real portfolio where lj has the form:

lj = [tlj , plj , vlj , yjl ]. (3)

Trading Sequence Alignment Algorithm

Each entry in the DNA sequence is represented just by a letter and hence is one-dimensional. However a trade record is of four-dimensional, as shown in (2) and (3). In view of the di- mensionality and the possible discrepancies between the two trading sequences, we develop a trading sequence alignment algorithm modified from the NW algorithm. The trading se- quence alignment algorithm is presented in Algorithm 2, where A and D are two matrices storing scores of the best alignments and trading volume imbalances respectively, biΘlj de- notes two matched trade records bi and lj being aligned (bi and lj are matched if ybi = ylj ), biΦlj denotes two mismatched trade records bi and lj being aligned (bi and lj are mismatched if yib 6 = yjl ), biΘ∆ denotes bi being aligned with a missing trade record, γm is a parameter

our algorithm should allow one bi being aligned with multiple lj ’s. The matrix D stores the trading volume imbalances of the alignment of b and l. More specifically, Di,j stores the trading volume imbalance of the last match of the best alignment of b1:i and l1:j. In (4b), we then can compute the reward of reducing the volume imbalance using the value of Di,j. There are three cases in the update of Di,j , see (5a)–(5c). The update of Di,j depends on which case we have in the update of Ai,j. We explain the cases separately in the following:

  1. For (5a), two matched trade records bi and lj are aligned. If Di,j > 0 then there are more trading volume in bi than lj , and vice versa. If Di,j = 0, then there is no trading volume imbalance.
  2. For (5b), one live trading record lj is added to the alignment containing bi and lj− 1. Since the trading volume imbalance in the alignment of bi and lj− 1 is already stored in Di,j− 1 , we calculate Di,j by subtracting the trading volume of lj from Di,j− 1.
  3. For (5c), two mismatched trade records bi and lj are aligned or a trade record is aligned with a missing trade record. In either case, there is no trading volume imbalance, so we set Di,j to be 0.

Score Matrix A

Score matrix A stores the scores of the best alignment of the two trading sequences. There are 3 cases in the update of Ai,j , see (4a)–(4c). The maximum score among the three cases is used to update Ai,j. We explain the cases separately in the following:

  1. For (4a), two trade records bi and lj are aligned. If the two trade records are matched, we assign a score with its magnitude depending on the similarity between the two trade records; if they are mismatched, we assign a penalty. The following function m(bi, lj )

is used to evaluate the score:

m(bi, lj ) =

1 − λt|tbi − tlj | − λp^ |p

bi − plj | pbi + plj^ −^ λv

|vib − vjl | |vib + vlj | ,^ if^ y

bi = ylj , (6a) −∞, if ybi 6 = ylj , (6b) where λt, λp and λv are parameters of the weights on time, price, and volume respec- tively. Recall that the yi, ti, pi, and vi are defined in (2) and (3). Since it is irrational to align two mismatched trade records, the output of m is −∞ if the two trade records are mismatched, i.e. ybi 6 = ylj. Hence two mismatched records will not be aligned. If bi and lj are matched, m(bi, lj ) will return a value that depends on how similar the trade records bi and lj are, i.e. the more similar they are, the larger the m(bi, lj ).

  1. In case (4b), the j-th live trading record lj is added to the alignment of bi with lj− 1 , or lj is aligned with a missing trade record. A function g is used to evaluate the score in such case and the function is given by the following:

g(Di,j− 1 , bi, lj ) =

h(Di,j− 1 , bi, lj ) if Di,j− 1 > 0 and yib = yjl , (7a) γm, if Di,j− 1 ≤ 0 or yib 6 = yjl , (7b) where the function h is given by:

h(Di,j− 1 , bi, lj ) = max

−γt|tbi − tlj | − γp^ |p

bi − plj | pbi + plj^ +^ γvf^ (Di,j−^1 , v

lj ), (8a) γm, (8b) with

f (Di,j− 1 , vjl ) =

vlj − Di,j− 1 Di,j− 1 ,^ if^ v lj > Di,j− 1 , (9a) 0 , if vlj ≤ Di,j− 1 , (9b) where γt, γp and γv are the weights on time, price, and volume imbalance respectively. We recall γm is the weight of alignment with a missing trade record.

Stage 2: Breakdown of Implementation Shortfall

Given the alignment results obtained in the first stage, one can break down the implementa- tion shortfall into its execution cost and opportunity cost. The breakdown is similar to the implementation shortfall breakdown approach proposed in Perold (1988). Yet our proposed method first aligns the two sequences with multiple transactions on both live trading and backtesting while Perold’s approach assumes the trading period lies between two transactions of the backtesting sequence. In this section, we present the derivation of the implementation shortfall breakdown. After the first stage, we obtain sets of aligned trade records where each set of aligned trade records contains aligned backtesting trade records and live trading records. Note that each set of aligned trade records can contain multiple live trading records but only one backtesting trade record. Let P (^) i,jl and V (^) i,jl be the price and volume of the j-th trade record of the i-th aligned set of trade records in live trading respectively, P (^) ib and V (^) ib be the price and volume of the i-th set of aligned trade records in backtesting respectively, N be the total number of sets of aligned trade records, and Mi be the number of live trading records in the i-th set of aligned trade records. The implementation shortfall S can be expressed as the market-to-market profit and loss difference between real and paper portfolios:

S =

N∑ +

i=

∑^ Mi j=

P (^) i,jl V (^) i,jl − P (^) ib V (^) ib ). (10)

Here, for simplicity, we assume all positions are unwound in one trade record at the end of the evaluation period for both portfolios, hence we set MN +1 = 1. In particular, P (^) Nl +1, 1 and P (^) Nb + are both equal to the asset price at the end of the period; and V (^) Nl +1, 1 and V (^) Nb +1 are equal to the trading volumes needed to entirely unwind the positions in the live trading portfolio and backtesting portfolio respectively at the end of the period, i.e. V (^) Nl +1, 1 = − ∑Ni=1^ ∑M j=1i V (^) i,jl and V (^) Nb +1 = − ∑Ni=1 V (^) ib.

We remark that in (10), the volumes are set to 0 for all missing trade records, and their price are set to the corresponding price in the aligned trade records. Applying some rearrangements to R.H.S. of (10), we obtain:

S =

N∑ +

i=

( (^) ∑Mi j=

P (^) i,jl V (^) i,jl − ∑^ Mi j=

P (^) ib V (^) i,jl + ∑^ Mi j=

P (^) ib V (^) i,jl − P (^) ib V (^) ib

N∑ +

i=

∑^ Mi j=

P (^) i,jl − P (^) ib

V (^) i,jl +

N∑ +

i=

P (^) ib

( (^) ∑Mi j=

V (^) i,jl − V (^) ib

Putting V (^) Nl +1, 1 = − ∑Ni=1^ ∑M j=1i V (^) i,jl , V (^) Nb +1 = − ∑Ni=1 V (^) ib , and P (^) Nl +1, 1 = P (^) Nb +1 into R.H.S. of (11), then applying some simplifications and rearrangements, we have:

S =

∑^ N

i=

∑^ Mi j=

P (^) i,jl − P (^) ib

V (^) i,jl ︸ (^) Execution cost︷︷ ︸

∑^ N

i=

P (^) ib − P (^) Nb +

)( (^) ∑Mi j=

V (^) i,jl − V (^) ib

︸ (^) Opportunity cost︷︷ ︸

The breakdown is interpreted as follows. In the first term, (P (^) i,jl − P (^) ib ) is the cost of trading at the price of P (^) i,jl instead of P (^) ib , and V (^) i,jl is the trading volume in which this cost is involved. In the second term, (∑M j=1i V (^) i,jl − V (^) ib ) is the volume of an unexecuted trade and (P (^) ib − P (^) Nb +1) is the net return of the unexecuted trade. The execution cost can be further broken down to delay cost, which is the cost caused by delayed execution, and market impact. The opportunity cost can be further broken down to over-trade cost and under-trade cost. The breakdown is given as follows:

S =

∑^ N

i=

P (^) i,l 1 − P (^) ib

V (^) i,l 1 ︸ (^) Delay cost︷︷ ︸

∑^ N

i=

∑^ Mi j=

P (^) i,jl − P (^) ib

V (^) i,jl ︸ (^) Market impact︷︷ ︸

∑^ N

i=

P (^) ib − P (^) Nb +

)( (^) ∑Mi j=

V (^) i,jl − V (^) ib

(^1) | ∑^ V (^) i,jl |>|V (^) ib | ︸ (^) Over-trade cost︷︷ ︸

∑^ N

i=

P (^) ib − P (^) Nb +

)( (^) ∑Mi j=

V (^) i,jl − V (^) ib

(^1) | ∑^ V (^) i,jl |<|V (^) ib | ︸ (^) Under-trade cost︷︷ ︸

rare in algorithmic trading implementation. High over-trade cost should give attention to potential human intervention during live trading. Our framework provides a valuable tool for continuous monitoring of algorithmic trading strategies. For more detailed discussions on diagnosis of algorithmic trading implementation, see Harris (2008, Chapter 6), Chan (2009, Chapter 3) and Pardo (2008, Chapter 6).

Experimental Results

In this section, we apply our alignment algorithm on illustrative live trading and backtesting data and show how one can calculate the implementation shortfall based on the results of our alignment algorithm. Exhibit 4 shows the trading sequences generated by backtesting and live trading respectively. Exhibit 5 shows an alignment result generated by our algorithm, where the end of evaluation period price is set to be 28390, i.e., P (^) Nl +1, 1 = P (^) Nb +1 = 28390, and parameters λt = 1, λp = 2, λv = 1, γt = 1, γp = 2, γv = 1 and γm = 1. Exhibit 6 shows another alignment result, where the parameters are the same as in Exhibit 5 but γt = 3, i.e. we put more penalty on the time difference between trade records for multiple trade records alignment. With more penalty on the time difference, we obtain different alignment results (marked by bold-face fonts) and hence produce a different implementation shortfall breakdown. Each row of Exhibit 5 and Exhibit 6 contains one set of aligned trade records.

Conclusion

We blend an efficient algorithm in bioinformatics and a classic implementation shortfall to create a useful tool for evaluating algorithmic trading implementation. The framework is applicable to high frequency trading since the complexity only grows linearly with respect to the number of trades in each trading sequence. Moreover, it is simple to implement and it provides an objective way to analyze implementation shortfall on algorithmic trading. The breakdown of costs resulted from the framework also gives astute insights on the algorithmic

Exhibit 4: Live trading and backtesting results

Backtesting Live trading Time Stamp Price Volume Type Time Stamp Price Volume Type 10:00:00.122 28387 1 Buy 10:00:03.008 28387.5 2 Buy 10:00:02.811 28387 2 Buy 10:00:04.154 28389 − 2 Sell 10:00:04.115 28389 − 4 Sell 10:00:04.417 28387 − 2 Sell 10:00:11.899 28394 5 Buy 10:00:08.593 28390 − 1 Sell 10:00:13.563 28396 − 1 Sell 10:00:12.018 28394 2 Buy 10:00:15.552 28400 7 Buy 10:00:12.080 28395 2 Buy 10:00:16.237 28398 4 Buy 10:00:14.026 28393 − 1 Sell 10:00:15.833 28402 5 Buy 10:00:16.005 28405 3 Buy 10:00:16.737 28400 2 Buy

trading implementation.

References

Altschul, S. F., and B. W. Erickson. 1986. “Optimal sequence alignment using affine gap costs.” Bulletin of mathematical biology 48 (5-6): 603–616.

Bailey, D. H., J. Borwein, M. L´opez de Prado, and Q. J. Zhu. 2016. “The probability of backtest overfitting.” Journal of Computational Finance 20 (4): 39–69.

Bertsimas, D., and A. Lo. 1998. “Optimal control of execution costs.” Journal of Financial Markets 1 (1): 1–50.

Chan, E. 2009. Quantitative trading: how to build your own algorithmic trading business. Hoboken, N.J.: John Wiley.

Corpet, F. 1988. “Multiple sequence alignment with hierarchical clustering.” Nucleic acids research 16 (22): 10881–10890.

Exhibit 6: Alignment and implementation shortfall breakdown on the paper trading and real trading data with μt = 3.

Paper trading

Real trading

Delay

Market

Over-trade

Under-trade

Time Stamp

Price

Volume

Type

Time Stamp

Price

Volume

Type

Cost

Impact

Cost

Cost

1

10:00:00.

bP 1 = 28387

bV 1 = 1

Buy

rP 1 ,^1 = 28387

rV 1 ,^1 = 0

0

0

0

3

2

10:00:02.

bP 2 = 28387

bV 2 = 2

Buy

10:00:03.

rP 2 , = 28387 1

.^5

rV 2 ,^1 = 2

Buy

1

0

0

0

3

10:00:04.

bP 3 = 28389

bV 3 =^ −

4

Sell

10:00:04.

rP 3 ,^1 = 28389

rV 3 ,^1 =^ −^2

Sell

0

4

0

0

10:00:04.

rP 3 ,^2 = 28387

rV 3 ,^2 =^ −^2

Sell

4

bP 4 = 28390

bV 4 = 0

10:00:08.

rP 4 ,^1 = 28390

rV 4 ,^1 =^ −^1

Sell

0

0

0

0

5

10:00:11.

bP 5 = 28394

bV 5 = 5

Buy

10:00:12.

rP 5 ,^1 = 28394

rV 5 ,^1 = 2

Buy

0

2

0

−^4

10:00:12.

rP 5 ,^2 = 28395

rV 5 ,^2 = 2

Buy

6

10:00:13.

bP 6 = 28396

bV 6 =^ −

1

Sell

10:00:14.

rP 6 ,^1 = 28393

rV 6 ,^1 =^ −^1

Sell

3

0

0

0

7

10:00:15.

bP 7 =^28400

bV 7 =^7

Buy

10:00:15.

rP 7 ,

= 1 28402

rV 7 ,

= 1 5

Buy

10

0

0

−^20

8

10:00:16.

bP 8 =^28398

bV 8 =^4

Buy

10:00:16.

rP 8 ,

= 1 28405

rV 8 ,

= 1 3

Buy

21

0

0

−^8

9

bP 9 =^28400

bV 9 =^0

10:00:16.

rP 9 ,

= 1 28400

rV 9 ,

= 1 2

Buy

0

0

20

0

Total profit and loss =

−^111

Total profit and loss =

−^143

Total= 35

Total= 6

Total= 20

Total=

−^29

Dionne, G., P. Duchesne, and M. Pacurar. 2009. “Intraday Value at Risk (IVaR) using tick- by-tick data with application to the Toronto Stock Exchange.” Journal of Empirical Finance 16 (5): 777–792.

Gotoh, O. 1982. “An improved algorithm for matching biological sequences.” Journal of molecular biology 162 (3): 705–708.

Harris, M. 2008. Profitability and Systematic Trading: A Quantitative Approach to Profitabil- ity, Risk, and Money Management. Hoboken, N.J.: John Wiley.

Hendershott, T., C. M. Jones, and A. J. Menkveld. 2013. “Implementation shortfall with transitory price effects.” In High Frequency Trading: New Realities for Trades, Markets, and Regulators, edited by D. Easley, M. L´opez de Prado, and M. OfffdfffdfffdHara, 185fffdfffdfffd–206. Risk Books.

Hirschberg, D. S. 1975. “A linear space algorithm for computing maximal common subse- quences.” Communications of the ACM 18 (6): 341–343.

Huang, X., and D. L. Brutlag. 2006. “Dynamic use of multiple parameter sets in sequence alignment.” Nucleic Acids Research 35 (2): 678–686.

Huang, X., and K.-M. Chao. 2003. “A generalized global alignment algorithm.” Bioinfor- matics 19 (2): 228–233.

Hurlin, C., G. Colletaz, and S. Tokpavi. 2007. Irregularly Spaced Intraday Value at Risk (ISIVaR) Models: Forecasting and Predictive Abilities. Technical report. Hyper Articles en Ligne.

Katoh, K., K. Misawa, K.-i. Kuma, and T. Miyata. 2002. “MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.” Nucleic acids research 30 (14): 3059–3066.