A Real-Time Algorithm for Searching for Substrings of a Pattern in Online Text Stream: An Alternative to Suffix Trees and Finite Automaton

Vinod Prasad

doi:10.14738/tecs.114.15065

A Real-Time Algorithm for Searching for Substrings of a Pattern in Online Text Stream: An Alternative to Suffix Trees and Finite Automaton

Authors

Vinod Prasad UTAS Sur, Sultanate of Oman

DOI:

https://doi.org/10.14738/tecs.114.15065

Keywords:

Approximate String Matching, Similarity Search, Substring Search

Abstract

In the field of similarity search, very little effort has been made to process the text without using suffix trees or automata. Most of the algorithms developed in the past use data structures and methods outlined above to accelerate the search process. However, their costly maintenance has always been a cause of concern. Let, T be a text of size N, and P be a pattern of size M. We present highly practical algorithms to process a real-time stream of text characters. The algorithm reports all substrings of pattern P in the text stream T as they appear in the stream. This is highly useful in detecting patterns in the data stream in real-time, for example, viruses and malware can be detected as they appear in the data stream. Additionally, the algorithm can be used to find the longest common substring(s) between two given strings. We achieved the stated goals without using any sophisticated data structure such as a suffix tree or finite automaton. For an alphabet size σ the algorithm runs in O(M) space, and O (MN/σ) expected time, which simplifies to O(N) for M ≤ σ. We also provide experimental results to support our claim.

Downloads

Published

2023-09-02

How to Cite

Prasad, V. (2023). A Real-Time Algorithm for Searching for Substrings of a Pattern in Online Text Stream: An Alternative to Suffix Trees and Finite Automaton. Transactions on Engineering and Computing Sciences, 11(4), 158–171. https://doi.org/10.14738/tecs.114.15065

Download Citation

Issue

Vol. 11 No. 4 (2023): Transactions on Engineering and Computing Sciences

Section

Articles

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

A Real-Time Algorithm for Searching for Substrings of a Pattern in Online Text Stream: An Alternative to Suffix Trees and Finite Automaton

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Author Center

Indexing

Follow Us

Current Issue

Most Read Last week