Three methods for text manipulation exist: lexical tokenizers, regular expressions and abstract syntax trees (AST). In this study I will focus on tokenizers as they are common in NLP tasks. AST, similarly to David Halter’s “Parso” library, is mainly used for modifying syntactic constructs in Python source code.
The text above was approved for publishing by the original author.