## Longest Common Subsequence Over Constant-Sized Alphabets: Beating the Naive Approximation Ratio

dc.contributor.advisor | Williams, Virginia Vassilevska | |

dc.contributor.advisor | Williams, Ryan | |

dc.contributor.author | Akmal, Shyan | |

dc.date.accessioned | 2022-02-07T15:25:35Z | |

dc.date.available | 2022-02-07T15:25:35Z | |

dc.date.issued | 2021-09 | |

dc.date.submitted | 2021-09-21T19:54:11.595Z | |

dc.identifier.uri | https://hdl.handle.net/1721.1/140127 | |

dc.description.abstract | This thesis investigates the approximability of the Longest Common Subsequence (LCS) problem. The fastest known algorithm for solving the LCS problem runs in essentially quadratic time in the length of the input, and it is known that under the Strong Exponential Time Hypothesis there can be no polynomial improvement over this quadratic running time. No similar limitation holds however, for approximate computation of the LCS, except in certain restricted scenarios. When the two input strings come from an alphabet of size k, returning the subsequence formed by the most frequent symbol occurring in both strings achieves a 1/k approximation for the LCS. It is an open problem whether a better than 1/k approximation can be achieved in truly subquadratic time (O(n^{2-δ}) time for constant δ > 0). A recent result [Rubinstein and Song SODA'2020] shows that a 1/2+ε approximation for the LCS over a binary alphabet is possible in truly subquadratic time, provided the input strings have the same length. In this paper we show that if for some ε > 0 a 1/2+ε approximation is achievable for binary LCS in truly subquadratic time when the input strings can have differing lengths, then for every constant k there exists some δ_k > 0 such that there is a truly subquadratic time algorithm that achieves a 1/k+δ_k approximation for k-ary alphabet LCS. Thus, we show that for constant-factor LCS approximation, the case of binary strings is in some sense the hardest case. We also show that for every constant k, if one is given two strings of equal length over a k-ary alphabet, one can obtain a 1/k+ε approximation for some constant ε > 0 in truly subquadratic time. This extends the Rubinstein and Song result to all alphabets of constant size, and gives the first nontrivial improvement over the naive 1/k approximation for the LCS of strings over alphabets of size k for all k ≥ 3. | |

dc.publisher | Massachusetts Institute of Technology | |

dc.rights | In Copyright - Educational Use Permitted | |

dc.rights | Copyright MIT | |

dc.rights.uri | http://rightsstatements.org/page/InC-EDU/1.0/ | |

dc.title | Longest Common Subsequence Over Constant-Sized Alphabets: Beating the Naive Approximation Ratio | |

dc.type | Thesis | |

dc.description.degree | S.M. | |

dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |

dc.identifier.orcid | https://orcid.org/0000-0002-7266-2041 | |

mit.thesis.degree | Master | |

thesis.degree.name | Master of Science in Electrical Engineering and Computer Science |