A Real-Time Algorithm for Searching for Substrings of a Pattern in Online Text Stream: An Alternative to Suffix Trees and Finite Automaton

Authors

  • Vinod Prasad UTAS Sur, Sultanate of Oman

DOI:

https://doi.org/10.14738/tecs.114.15065

Keywords:

Approximate String Matching, Similarity Search, Substring Search

Abstract

In the field of similarity search, very little effort has been made to process the text without using suffix trees or automata. Most of the algorithms developed in the past use data structures and methods outlined above to accelerate the search process. However, their costly maintenance has always been a cause of concern. Let, T be a text of size N, and P be a pattern of size M. We present highly practical algorithms to process a real-time stream of text characters. The algorithm reports all substrings of pattern P in the text stream T as they appear in the stream. This is highly useful in detecting patterns in the data stream in real-time, for example, viruses and malware can be detected as they appear in the data stream. Additionally, the algorithm can be used to find the longest common substring(s) between two given strings. We achieved the stated goals without using any sophisticated data structure such as a suffix tree or finite automaton. For an alphabet size σ the algorithm runs in O(M) space, and O (MN/σ) expected time, which simplifies to O(N) for M ≤ σ. We also provide experimental results to support our claim.

Downloads

Published

2023-09-02

How to Cite

Prasad, V. (2023). A Real-Time Algorithm for Searching for Substrings of a Pattern in Online Text Stream: An Alternative to Suffix Trees and Finite Automaton. Transactions on Engineering and Computing Sciences, 11(4), 158–171. https://doi.org/10.14738/tecs.114.15065