Identifying Health Information Technology Related Safety Event Reports from Patient Safety Event Report Databases.
Citation: Journal of Biomedical Informatics. 2018 Sep 10PMID: 30213556Institution: MedStar Institute for InnovationDepartment: National Center for Human Factors in HealthcareForm of publication: Journal ArticleMedline article type(s): Journal ArticleSubject headings: IN PROCESS -- NOT YET INDEXEDYear: 2018ISSN:- 1532-0464
Item type | Current library | Collection | Call number | Status | Date due | Barcode |
---|---|---|---|---|---|---|
Journal Article | MedStar Authors Catalog | Article | 30213556 | Available | 30213556 |
CONCLUSION: The feature-constraint model provides a method to identify HIT-related patient safety hazards using a method that is applicable across healthcare systems with variability in their PSE report structures.
Copyright (c) 2018. Published by Elsevier Inc.
DISCUSSION: A difference-based scoring, prioritization, and feature selection approach can be used to generate simplified models with high performance. A feature-constraint model may be more easily shared across healthcare organizations seeking to analyze their respective datasets and customized for local variations in PSE reporting practices.
METHODS: 5,287 PSE reports manually coded as likely or unlikely related to HIT were used to train unigram, bigram, and combined logistic regression and support vector machine models using five-fold cross validation. A difference-based scoring approach was used to prioritize and select unigram and bigram features by their relative importance to likely and unlikely HIT reports. A held-out set of 2,000 manually coded reports were used for testing.
OBJECTIVE: The objective of this paper was to identify health information technology (HIT) related events from patient safety event (PSE) report free-text descriptions. A difference-based scoring approach was used to prioritize and select model features. A feature-constraint model was developed and evaluated to support the analysis of PSE reports.
RESULTS: Unigram models tended to perform better than bigram and combined models. A 300-unigram logistic regression had comparable classification performance to a 4030-unigram SVM model but with a faster relative run-time. The 300-unigram logistic regression model evaluated with the testing data had an AUC of 0.931 and a F1-score of 0.765.
English