ML Model Performance
I think that this API is a great initiative, because there is scope for ML to improve outcomes in healthcare. Limited access to patient-level data is a big challenge, and simulating data (or upsampling) is one technique to overcome this. However, having tested upsampling techniques on a small dataset, my past research has shown that this does little to improve results on real-life out-of-sample patient data.
As I’m considering developing an open source application built on this, I’m trying to get a sense of the robustness of the ML model you’ve developed. I’ve read the patent document, but still have a few questions. I would greatly appreciation any clarifications:
- I understand that you used simulated patient data to train your model. How did you ensure that this simulated patient dataset was representative of the wider population?
- I read in the patent documents that real patient data was optional. What impact does this have on the model?
- How does the model fare against real-life patients?
- What are the ML performance metric scores for each outcome? Some metrics include:
Doe mee aan de discussie - voeg hieronder een opmerking toe
Thank you for your clarifications! They’ve helped me to better understand the approach. I’ve replied you via your email on the Endless Medical website.
Thank you for your questions, kind words, and interest.
Let me try to answer the best I can (sorry for the delay; I have just finished working with patients).
Chris, so you are right, there are strings attached to simulated individual patient-level data. The data in a current database comes either from the literature review (case description, epidemiologic studies) or from experts (for now, it is only me) experience. I am choosing epidemiologic studies that plainly describe the frequency/specificity/sensitivity of various findings in history or physical examination observed in a given disease and the connections between test results/symptoms/signs/diseases.
When it comes to the literature review, I try to, if I remember to, add sources of and references to diseases/symptoms/signs - and I promise I will get better in keeping them updated. See the Wiki fields in the JSON files.
When it comes to my experience, being a doc for 25 years, working in various fields, I have seen so many various cases, diseases, that I simply can say, what findings are common, or uncommon for most the disorders…
As we grow, we are planning to add more research, more experts, and consolidate these sources using fluid reputation scoring/ranking, as described in the patent. Technically system, after it is up and running at full capacity, should provide means to validate (semi- or even automatically) each and every case, disease, feature… in the database.
Having other experts join me “is in the making”. For now my clinical experience, literature review I regularly do as a practicing physician, targeted review specifically for the needs of EndlessMedical API, and hours of testing is the only verification of API responses.
Would I use it now as the sole tool to treat or diagnose patients- absolutely not! Would I use it as a source of suggestions, and tips when managing patients, yes?
With things actively developing (Konrad is doing the real deal heavy lifting), I would take all the APIs suggestions with a grain of salt. We are actively working on one APP using the API, that would provide means of data validation.I shall be able to share more soon - I think this APP will us all, including me, much more confident of generalization of API responses.
To answer your question: At this point, the database has no real patients - at all. It has not been validated against real patients data either. I have however tested its answers, and combination o,f symptoms to assure its fit to reality. I am doing this commonly, after adding features or outcome.
The option to consolidate simulated and real patient data - was added mainly to leave IP protection for further development. It also has been added for some future uses and users. I personally became naturally “opposed” to use actual patients’ data. I think I am throwing a fit - yes - for years I wanted to lay my hands on EMR / real patients databases (since 2009 to be exact) - and I had my head hit so many closed-door and brick walls that it is not even funny. I simply decided to live without it.
The other questions about the quality of ML: ML modeling we use returns internal (cross-validation measures) of various measures. I will at some point prepare a summary for you if you wish,
If I remember well, OOB error for the random forest, was minimal, and this is because I am adding features, cases until I have satisfactory discrimination (as described in patent) between diseases. Note that it is not that critical - in a way. Why? The system is not giving you discrimination between diseases, and it is not a precise list of diseases ruled-in and ruled-out. Just like in real life trust me on this one - if you have one patient, you may have 5 different lists/orders of most likely diagnoses by 5 different providers -the key is 4 of most likely diagnoses are most likely be listed in all these 5 lists… Additionally, responses by some endpoints are averaged anyways, and the next diagnostic steps are calculated/suggested based on the top couple of suggestions, not only for the first most likely diagnosis. So it really doesn’t matter as much if pneumonia, asthma, or COPD take 1,2, or 3 place. Probably the next suggested test will be chest x-ray anyway… and they all need a pulmonary specialist if they get worse, and any reasonable clinician looking at the list would still consider all 3 if they show up in 3 first places, regardless of the exact “likelihoods”.
It is less of an exact science than predicting the creditor default, it will always be art, gut feeling.
One more thing I want you to consider: the API, in its current general public release, is functioning as a “hybrid tool”: some diseases it has are more common for emergency department (ED) and some are more common in general practice (GP). In the future, you will be able to choose a setting, be it, for example, general practice/office, emergency department, etc… and the system will “re-calculate” pretest likelihood and hold to it while returning responses.
Take care and stay safe, what is the app you have on your mind if you wish to disclose?