Preprocessor Component
Input: String (HTML file)
Output: String (raw text)
remove html tags and some other preprocessing
probably using https://stanfordnlp.github.io/CoreNLP/cleanxml.html
Edited by Stefan Heid
Input: String (HTML file)
Output: String (raw text)
remove html tags and some other preprocessing
probably using https://stanfordnlp.github.io/CoreNLP/cleanxml.html