Automated ICD-9 Coding via A Deep Learning Approach

Automated ICD-9 Coding via A Deep Learning Approach
Li, M., Fei, Z., Zeng, M., Wu, F., Li, Y., Pan, Y., & Wang, J. (2018). IEEE/ACM transactions on computational biology and bioinformatics.

INTRODUCTION ICD-9 codes mean that each disease has a unique code and is used in the electronic health records as a billing mechanism. Usually, ICD-9 codes are undertaken by the coders of the hospital’s Medical Record Department who assign ICD-9 codes to medical record according to a doctor’s clinical diagnosis. However, because they need to master specialized skills such as the knowledge in the field of medicine, coding rules and medical terminologies, manual coding is expensive, time consuming and inefficient. Considering these constraints, it is urgent to develop an accurate computational approach for automated ICD-9 coding. 每種疾病都有一個唯一的ICD-9代碼，並在電子健康記錄中用作計費機制。通常，ICD-9代碼由編碼人員的醫療記錄部門負責，他們根據醫生的臨床診斷將ICD-9代碼分配給病歷。然而，因為他們需要掌握專業技能，例如醫學，編碼規則和醫學術語領域的知識，手動編碼是昂貴，耗時且低效的。考慮到這些限制，迫切需要開發一種用於自動ICD-9編碼的精確計算方法。

automatically assign ICD-9 codes
The automated ICD-9 coding task usually has the following problems: 1) Patient’s clinical records is not always structured in the same way. It is very difficult to extract important and relevant knowledge from various kinds of medical records effectively. 2) The medical field has a lot of terminologies, which is difficult for non-professionals to understand the meaning of these terminologies. Additional tools are needed to interpret the some terms and symptoms and to get semantic information from medical records. 3) Each physician usually has his own way to describe symptoms. Therefore, even for the same disease, there are many different ways to describe it. 自動ICD-9編碼任務通常有以下問題： 1）患者的臨床記錄並不總是以相同的方式構建。有效地從各種醫療記錄中提取重要的相關知識是非常困難的。 2）醫學領域有很多術語，非專業人員很難理解這些術語的含義。需要額外的工具來解釋一些術語和症狀，並從醫療記錄中獲取語義信息。 3）每位醫生通常都有自己的方式來描述症狀。因此，即使對於同一種疾病，也有許多不同的方法來描述它。

Baseline model SVM Perotte et al. used flat SVM as a baseline model on MIMIC-II dataset (22,815 documents) with documents represented using a ’bag of words’ (BoW) model and proposed a hierarchy-based SVM which obtained a higher F-measure value than flat SVM. The flat SVM treats each ICD-9 code independently of each other, but hierarchy-based SVM leverages the hierarchical nature of ICD-9 codes into its modeling. The hierarchy-based SVM tends to achieve a higher recall and F- measure at the tradeoff of precision in their conclusions. Perotte等人使用平面SVM作為MIMIC-II數據集（22,815文檔）的基線模型，文檔用“袋子”（BoW）模型表示，並提出了一種基於層次結構的SVM，它獲得了比平面S S更高的Fmeasure值。。平面SVM將每個ICD-9代碼彼此獨立地處理，但基於層次結構的SVM利用ICD-9代碼的分層特性進行建模。基於層次結構的SVM傾向於在其混淆的精確度折衷下實現更高的召回率和F-度量。

DeepLabeler CNN + D2V In this paper, we propose an end-to-end deep learning framework, so-called DeepLabeler, which combined CNN with D2V technique to automatically assign ICD-9 codes. We perform extensive experiments on MIMIC-II and MIMIC-III datasets and the results demonstrate that our DeepLabeler outperforms the hierarchy-based SVM. 我們提出了一個端到端的深度學習框架，即所謂的DeepLabeler，它結合了CNN和D2V技術，自動分配ICD-9代碼。我們對MIMIC-II和MIMIC-III數據集進行了大量實驗，結果表明我們的DeepLabeler優於基於層次結構的SVM

D2V Document to Vector The D2V ’Document to Vector’ is an unsupervised algorithm, which was proposed by Le et al. to learn fixed-length feature representation from variable-length documents. In D2V training processing, each document and word is encoded as a unique dense vector. We call the document vector as document embedding (DE) and word vector as word embedding (WE). These vectors are concatenated together to predict the next word in its context. D2V can capture effective global features of given medical text. Specifically, D2V generates document vectors and word vectors which are stored in the training dataset. Then, each document vector in the training dataset can be regarded as a summary of a medical text. Finally, we can use them to train CNN for prediction. To the best of our knowledge, D2V technique has never been utilized in the task of ICD-9 assignment. D2V是一種無監督算法：用於學習可變長度文檔的固定長度特徵表示。在D2V訓練處理中，每個文檔和單詞被編碼為唯一的密集向量。我們將文檔向量稱為文檔嵌入（DE），將單詞向量稱為單詞嵌入（WE）。這些向量連接在一起以預測其上下文中的下一個單詞。 D2V可以捕獲給定醫學文本的有效全局特徵。具體地，D2V生成存儲在訓練數據集中的文檔向量和單詞向量。然後，訓練數據集中的每個文檔向量可以被視為醫學文本的摘要。最後，我們可以使用它們來訓練CNN進行預測。 D2V技術從未用於ICD-9分配任務。

CNN convolutional neural networks
To further improve machine learning methods for automatically assigning ICD-9 codes, we borrow the ideas from very recent breakthrough in deep learning. CNN models are widely used in various Natural Language Processing (NLP) problems and have achieved promising results in semantic parsing , search query retrieval , sentence modeling, sentence classification, prediction other traditional NLP tasks. It is well known that local context feature is critical for automatical ICD-9 code assignments from the discharge summary contents. specifically, medical terminologies which consist of words are probably the most effective features for ICD-9 code assignments. 為了進一步改進自動分配ICD-9代碼的機器學習方法，我們藉鑑了深度學習中最近的突破。 (特別是卷積神經網絡（CNN）已被證明對計算機視覺任務有效，如醫學圖像分類 / 物體檢測 / 圖像分割) CNN的模型被證明可以有效的處理各種自然語言處理的問題如：語意分析 / 搜尋結果提取 / 句子建模 / 分類 /預測 / 其他傳統的NLP任務。找出local feature是automatical ICD-9 code assignments最重要的事具體而言，要在出院報告中找出醫學術語可能是ICD-9代碼分配的最有效特徵。

METHODS DeepLabeler we need to pre-train the word vectors (WE) and document vectors (DE) before training DeepLabeler. Given a document, we use the word vectors to train the CNN part and obtain a highly informative feature vector which can represent the whole document. At the same time, the document vector is fine tuned by one fully connected layer of D2V part and then is concatenated with another feature vector generated by CNN. Finally, a full connection layer is stacked to output a result vector which can represent the probability of each ICD-9 code. The sigmoid activation function is applied in the output layer which is the practice of multi-label learning. Finally a code is assigned when the output score is greater than a pre-determined threshold. 在訓練DeepLabeler之前，我們需要預先訓練單詞向量（WE）和文檔向量（DE）。給定一個文檔，我們使用word vectors來訓練CNN部分並得到可以代表整個文檔的高度信息的特徵向量。同時，document vector由D2V部分層進行微調，然後與CNN生成的另一個特徵向量連接。最後，堆疊完整連接層以輸出結果向量，該結果向量可以表示每個ICD-9代碼的概率。 sigmoid激活函數應用於輸出層，這是多標籤學習的實踐。最後，當輸出分數大於預定閾值時，分配代碼。

D2V Document to Vector In this framework, every word and the whole document is represented a unique dense vector. Then the document and words vectors are concatenated to predict the next word in its context. In this way, the word order is kept, and thus better vector representations can obtained. The learning methods of document vector and word vectors are regarded as unsupervised preprocessing of clinical free texts. 在這個框架中，每個單詞和整個文檔都代表一個唯一的密集向量。然後連接document vector和word vector以預測其上下文中的下一個單詞。以這種方式，保持了字順序，因此可以獲得更好的矢量表示。 (具有一個隱藏層的完全連接的神經網絡用於訓練模型。) (在全連通神經網絡中使用具有隨機梯度下降的反向傳播算法。) 文檔向量和單詞向量的學習方法被注意為臨床自由文本的無監督預處理。

CNN Convolutional neural network
Before using the CNN model, it is general to translate each word in the documents into a low dimensional dense word vector using W2V technology, and then all the word vectors in the same document can make up a matrix. The document matrix is finally fed to the CNN model for generating a fixed length vector. The output of this process is a feature vector which can represent the document in a low dimensional dense form. Unlike D2V, it is a process of supervised learning based on stochastic gradient descent and back propagation. CNNs utilizes layers with convolving filters that are applied to local features. This means that the network automatically learns the features that were hand-engineered in traditional algorithms. 在使用CNN模型之前，通常使用W2V技術將文檔中的每個單詞翻譯成低維密集單詞向量，然後同一文檔中的所有單詞向量可以構成矩陣。最後將文檔矩陣饋送到CNN模型以生成固定長度矢量。該過程的輸出是特徵向量，其可以以低維密集形式表示文檔。與D2V不同，它是基於隨機梯度下降和反向傳播的監督學習過程。 CNN利用具有捲積濾波器的層，這些濾波器應用於局部特徵。這意味著網絡會自動學習傳統算法中手工設計的功能。

The Benefits of DeepLabeler
CNN can extract rich local features, but has two major defects in extracting features of medical data in our task. Firstly, CNN ignores the semantic features of the full text because it does not take into account the order of words or phrases. Secondly, the number of words in a discharge summary note varies from dozens to thousands, but CNN model only takes a unchanged shape matrix as input. This means that a large amount of document matrix must be either truncated or padded by some zeros, which may lose a certain proportion of the original information in the document. Therefore, the performance of the CNN model may be seriously affected when the length of each document varies significantly. Foreseeing these problems, we realize that the D2V part can be an important complement to the CNN part. D2V captures semantic information of a whole document by taking the order of words into consideration. More importantly, the D2V part put all words in a document for training, thus it would not discard any useful word information, which is not possible for CNN part. Therefore, we are inspired to integrate the CNN and D2V parts to achieve the better performance in this multi-label classification and natural language understanding task. CNN可以提取豐富的局部特徵，但在我們的任務中提取醫學數據的特徵有兩個主要缺陷。首先，CNN忽略了全文的語義特徵，因為它沒有考慮單詞或短語的順序。其次，出院摘要中的單詞數量從幾十到幾千不等，CNN模型只採用不變的形狀矩陣作為輸入。這意味著大量的文檔矩陣必須被一些零截斷或填充，這可能會丟失文檔中原始信息的一定比例。在那裡，當每個文檔的長度顯著時，CNN模型的性能可能受到嚴重影響。預見到這些問題，我們意識到D2V部分可以成為CNN部分的重要補充。 D2V通過考慮單詞的順序來捕獲整個文檔的語義信息。更重要的是，D2V部分將所有單詞放在文檔中進行訓練，因此不會丟棄任何有用的單詞信息，這對於CNN部分是不可能的。在那裡，我們受到啟發，整合CNN和D2V部分，以在這種多標籤分類和自然語言理解任務中實現更好的性能。

MIMIC Multi-parameter Intelligent Monitoring in Intensive Care
MIMIC is a real-world Intensive Care Unit (ICU) database and it is a popular database for studying automated ICD-9 coding. It is a publicly available database developed by the MIT Lab for Computational Physiology, comprising de-identified health data associated with tens of thousands critical care patients. It includes discharge summaries, diagnostic codes, vital signs, laboratory measurements, etc. The latest version of MIMIC is MIMICIII ( which comprises over 58,000 hospital admissions. The data spans June October 2012. The MIMIC-II is an older version from 2001 to 2008. Because these medical records are from the ICU patients, the distribution of the diseases has a certain tendency that serious diseases are likely to be overrepresented. MIMIC（重症監護中的多參數智能監測）是一個真實的重症監護病房（ICU）數據庫，它是一個用於研究自動化ICD-9編碼的流行數據庫。它是由麻省理工學院計算生理學實驗室開發的公開數據庫，包括與數万名重症監護患者相關的去識別健康數據。它包括出院報告，診斷代碼，生命體徵，實驗室測量等。 MIMIC的最新版本是MIMIC III（數據涵蓋2001年6月至2012年10月。 MIMIC-II是2001年至2008年的舊版本。由於這些醫療記錄來自ICU患者，因此分佈具有嚴重疾病可能過多的確定性趨勢。

the top 105 codes making up 50% of the total codes
Rank ICD-9 Code Frequency of Patients with Code 1 401.9: Hypertension 0.3513 10 530.81: Esophageal reflux 0.1073 50 276.7: Hyperpotassemia 0.0368 100 V10.46: Personal history of malignant neoplasm of prostate 0.0205 500 785.2: Undiagnosed cardiac murmurs 0.0037 1000 999.8: Other and unspecified transfusion reaction not else-where classified 0.0014

There are three factors that make automated ICD-9 coding difficult.
Total # of discharge summary Avg. # of words per discharge summary Max # of words per Min # of words per 52,962 1,524 7,980 9 Total # of codes Avg. # of codes per patient Max # of codes per Min # of codes per Patient 6,984 11 39 1 First, the number of ICD-9 codes is large and the distribution of ICD-9 codes is a serious biased distribution. Second, each patient has different number of ICD-9 codes which varies greatly. Thirdly, the average of number of samples of each code is small. 首先，ICD-9代碼的數量很大，ICD-9代碼的分佈是嚴重的偏差分佈。這意味著大多數ICD-9代碼只有少量帶註釋的醫療記錄，因此導致嚴重的數據集分佈不平衡。其次，每個患者俱有不同數量的ICD-9代碼，這些代碼變化很大。一些患者可能有多達39個相關代碼，而其他患者可能只有一個代碼。第三，每個代碼的樣本數的平均值很小。樣本數為6,984，樣本數為52,962，因此每個代碼的平均樣本數僅為7.58，這表明大多數樣本缺乏足夠的樣本進行培訓，導致分類性能較差。

EXPERIMENTS Evaluation Metrics
We used micro-average precision, micro-average recall and micro-average F-measure to evaluate the performance of our model. F-measure is the harmonic mean of precision and recall, which is a good indicator for the overall predictive power of models. It is the most important metric in automated ICD-9 coding task. Our model returns the probability for each ICD-9 code, which allows for further tuning to optimize the precision and recall. This tuning is carried out by specifying a threshold in this study, as the models have already been optimized for F-measure. Micro-averaged F-measure is chosen for our interest in predicting correct ICD-9 codes for as many patients as possible, rather than ensuring good coverage of the different classes. 我們使用微觀平均精度，微觀平均召回和微觀平均F測量來評估我們模型的性能。 F-measure是精度和召回的調和平均值，它是模型整體預測能力的良好指標。它是自動ICD-9編碼任務中最重要的指標。我們的模型返回每個ICD-9代碼的概率，允許進一步調整以優化精度和召回。通過在本研究中指定閾值來執行此調整，因為模型已針對F測量進行了優化。選擇微平均F-測量值是為了我們對盡可能多的患者預測正確的ICD-9代碼的興趣，而不是包括對不同類別的良好覆蓋。

EXPERIMENTS Implementation
We used an open-source tool, Tensorflow [45] ( to implement the CNN model. Document embedding and word embedding were implemented by using Gensim ( The skip-gram model was used to train the word embedding vectors in Gensim. 我們使用開源工具Tensorflow來實現CNN模型。使用Gensim實現DE和WE。 skip-gram模型用於在Gensim中訓練單詞嵌入向量。

parameters To obtain the best performance of DeepLabeler, we searched for a set of various parameters of network architectures to find the best parameters for automated ICD-9 coding. These parameters include: 1) CNN part: word embedding size, the maximum length of notes, convolutional kernel size, convolutional filter number, convolutional layer number and dropout rate; 2) D2V part: document embedding size, the number of neurons in the hidden layer. 為了獲得DeepLabeler的最佳性能，我們搜索了一組網絡體系結構的各種參數，以找到自動ICD-9編碼的最佳參數。這些參數包括： 1）CNN部分：字嵌入大小，音符最大長度，卷積核大小，卷積濾波器數，卷積層數和丟失率; 2）D2V部分：文檔嵌入大小，隱藏層中的神經元數量。

various parameters of network architectures The optimal structure for automated ICD-9 coding
MIMIC-III MIMIC-II

MIMIC-II (a), (b) MIMIC-III (c), (d)

deep learning model is often more effective when more data are available
MIMIC-III, 52,962 discharge summaries CONTAINS 6,984 ASSOCIATED CODES MIMIC-II, discharge summaries CONTAINS 5031 ASSOCIATED CODES Micro-average Precision Recall F-measure flat-SVM 0.635 0.158 0.253 hierarchy-based SVM 0.415 0.280 0.355 D2V+CNN threshold= 0.2 0.486 0.351 0.408 0.3 0.555 0.292 0.383 Micro-average Precision Recall F-measure flat-SVM 0.562 0.130 0.211 hierarchy-based SVM 0.395 0.233 0.293 D2V+CNN threshold= 0.25 0.475 0.258 0.335 0.5 0.616 0.159 0.253

CNN part is the most effective component D2V is necessary for enhancing classification performance
Micro-average Precision Recall F-measure Only using CNN 0.440 0.366 0.399 Only using D2V 0.375 0.261 0.308 DeepLabeler CNN+D2V 0.486 0.351 0.408

CONCLUSION we presented DeepLabeler, which combined CNN part with D2V part to extract local and global features. We analyze the deep neural network structure and find that CNN is the most effective component in our network, which is quite intuitive and makes sense. The D2V part in our model is necessary for enhancing classification performance since it extracts well- recognized global features. It obtained a better performance on MIMIC-III dataset over the traditional methods if only using discharge summaries from each patient. Our computational results show that DeepLabeler achieved about 15% increase in micro F-Measure over flat SVM or hierarchy-based SVM. To better evaluate the performance of our model, we compare the experiment results on MIMIC-II and MIMIC-III, and find the performance of our model is greatly improved (with F-measure of vs ) when training with the data of double size. We believe that DeepLabeler has potential to acquire better performance if sufficient good-quality data are available. 我們介紹了DeepLabeler，它將CNN部分與D2V部分結合起來，以提取本地和全局特徵。我們分析了深度神經網絡結構，發現CNN是我們網絡中最有效的組件，這是非常直觀和有意義的。我們模型中的D2V部分對於提高分類性能性能是必要的，因為它提取了公認的全局特徵。使用MIMIC-III跑分和傳統的SVM相比DeepLabeler有更好好的性能。 micro-average F-Measure增加了約15％。為了更好地評估我們的模型的性能，我們比較了MIMIC-II和MIMIC-III的實驗結果，並且發現我們的模型的性能得到了極大的改善（F測量值為0.408對0.335）。雙倍大小。我們相信，如果有足夠的優質數據，DeepLabeler有可能獲得更好的性能。

Advantages of DeepLabeler
The first one is that it can automatically extract effective representative features. The second one is that we only need to train an end-to-end model rather than thousands of binary classifiers, and considered the dependency relationship between each labels naturally. DeepLabeler has the potential to be applied to more interesting areas to understand and analyze medical record. 它可以自動提取有效的代表性特徵。我們只需要訓練端到端模型而不是數千個二元分類器，並自然地考慮每個標籤之間的依賴關係。 DeepLabeler有可能應用於更有趣的領域來理解和分析病歷。

ref 深度學習在自然語言處理（NLP）的技術與應用
CNN的運作原理 Hant/how_machine_learning_works/how_convolutional_neural_net works_work.html

Automated ICD-9 Coding via A Deep Learning Approach

Similar presentations

Presentation on theme: "Automated ICD-9 Coding via A Deep Learning Approach"— Presentation transcript:

Similar presentations

About project

反馈

请登录

Auth with social network:

Automated ICD-9 Coding via A Deep Learning Approach

Similar presentations

Presentation on theme: "Automated ICD-9 Coding via A Deep Learning Approach"— Presentation transcript:

Similar presentations

About project

反馈