Text Nailing is an information extraction method of semi-automatically extracting structured information from unstructured documents. The method allows a human to interactively review small blobs of text out of a large collection of documents, to identify potentially informative expressions. The identified expressions can be used then to enhance computational methods that rely on text as well as advanced natural language processing human-interaction with narrative text to identify highly prevalent non-negated expressions, and 2) conversion of all expressions and notes into non-negated alphabetical-only representations to create homogeneous representations. In traditional machine learning approaches for text classification, a human expert is required to label phrases or entire notes, and then a supervised learning algorithm attempts to generalize the associations and apply them to new data. In contrast, using non-negated distinct expressions eliminates the need for an additional computational method to achieve generalizability.
Chen & Asch 2017 wrote "With machine learning situated at the peak of inflated expectations, we can soften a subsequent crash into a “trough of disillusionment” by fostering a stronger appreciation of the technology’s capabilities and limitations." A letter published in Communications of the ACM, "Beyond brute force", emphasized that a brute force approach may perform better than traditional machine learning algorithms when applied to text. The letter stated "... machine learning algorithms, when applied to text, rely on the assumption that any language includes an infinite number of possible expressions. In contrast, across a variety of medical conditions, we observed that clinicians tend to use the same expressions to describe patients' conditions." In his viewpoint published in June 2018 concerning slow adoption of data-driven findings in medicine, Uri Kartoun, co-creator of Text Nailing states that "...Text Nailing raised skepticism in reviewers of medical informatics journals who claimed that it relies on simple tricks to simplify the text, and leans heavily on human annotation. TN indeed may seem just like a trick of the light at first glance, but it is actually a fairly sophisticated method that finally caught the attention of more adventurous reviewers and editors who ultimately accepted it for publication."
Criticism
The human in-the-loop process is a way to generate features using domain experts. Using domain experts to come up with features is not a novel concept. However, the specific interfaces and method which helps the domain experts create the features are most likely novel. In this case the features the experts create are equivalent to regular expressions. Removing non-alphabetical characters and matching on "smokesppd" is equal to the regular expression /smokes*ppd/. Using regular expressions as features for text classification is not novel. Given these features the classifier is a manually set threshold by the authors, decided by the performance on a set of documents. This is a classifier, it's just that the parameters of the classifier, in this case a threshold, is set manually. Given the same features and documents almost any machine learning algorithm should be able to find the same threshold or a better one. The authors note that using support vector machines and hundreds of documents give inferior performance, but does not specify which features or documents the SVM was trained/tested on. A fair comparison would use the same features and document sets as those used by the manual threshold classifier.