top of page

NHS, NLP, LLM, GPT, CSO, PhD ... and any other 3 letter acronyms you can think of



My Hiatus

Dear friends, Apologies for the hiatus.

I've not published for a whole month - apart from the joys of spending quality time with my family over Christmas and New Year, I've also spent much of the last month preparing myself for an interview.

An Interview?! But Sam, You're in your dream job - What could you possibly be interviewing for now? Haven't you got it all? Haven't you achieved enough?! You're a doctor, an oncology registrar at the coveted Beatson Cancer Centre...

What else in the universe could you possibly need or want you lunatic!?!

WHEN WILL YOU BE ENOUGH?!!? ...

A series of questions I often ask myself.


However, as many of you who know me personally will attest from my occasional rants... that I'm trying, in my own little way, to save the NHS.

I truly believe it is one of the greatest things we have ever created on our little island.

And whilst people will complain about the 20-hour waits in A&E and spending a similar amount of time on hold trying to get a GP appointment, which I'm not saying isn't really annoying and really quite crap, but if they were to be offered a glimpse of the alternative...

i.e. a £5000 bill every time they went to A&E and got a bag of fluids and some paracetamol, they may quickly find the long wait not so bad after all.

(I'm not exaggerating - This is the actual experience of my Californian relatives, and they are the lucky ones who have health insurance. I wouldn't even want to speculate what it costs without).


So whilst we have a free health service, and when I look at my family and know one day we will have to use it, I'm going to do every damn thing I can to save it.

And frankly, so should you.


So about the interview. (And all the mnemonics above)

In the oodles of spare time I have between my registrar job, looking after my 2 kids, and the additional work I do through Adopt-A-Doc, I put together a 10,000-word Ph.D. proposal to something called the Chief Scientist Office (CSO).

The CSO is a Scottish Government Health Directorate with a vision to support and increase the level of high-quality health research conducted in Scotland.


I want to bring some much-needed AI and digital innovation into the NHS, (See previous rants and open letters), specifically for our cancer treatment services.

Splicing digital technology, an app, a workflow change, and a sprinkle of generative AI,

I hope to bake a cake of streamlined patient services and greatly improved, empowered patient experiences. And it's just crazy enough to work.


And just found out that I got my proposal funded! Yeeehaaaaw. Awesome.

I'll keep you posted on how that goes!


So today let's talk about the technology I want to use for the innovation, what AI can do for the NHS, how it does it - and all that jazz.


All that Jazz:

Co-pilots, Chat Bots, Natural Language Processing (NLP), and Large Language Models (LLM)

As you might guess by the name, NLP is a field of AI that gives computers the ability to manipulate and process human language.

It's an extremely bizarre idea that computers can use and understand language.

I've been absolutely stunned by what my co-pilot can generate, it's something I've struggled to accept on a fundamental level since I started using current AI.

Probably when it comes to such things, when my brain doesn't want to accept it, it's because it doesn't really understand it ;)


So how is it, that by applying machine learning to large datasets - you get something at the end of it that appears to understand the text? What is the LLM doing?

Can it really understand the text? Or has it just got really good at looking like it does...


I can type out frankly barely understandable sentences, and my copilot will give me correct and seemingly well-thought-out answers with a plethora of references and examples to back itself up. Examples that it appears to generate itself. And they make sense...

Although it often uses cats and dogs in its examples, and it should know by now that I don't really like dogs. AI fail.


However, whilst coming up with the snappy title for this post I thought I'd put my co-pilot to the test :


Me and Bing Co-Pilot having a chat:

is NHS a mneumonic?
Yes, NHS is a mnemonic. However, the meaning of NHS can vary depending on the context. In the context of healthcare, NHS stands for National Health Service,

Followed by:

is NHS a acronym?
Yes, NHS is an acronym. An acronym is a type of abbreviation that comprises of using the first letters of a series of words to indicate a new word. The most significant feature of an acronym is that it can be pronounced as a new word, unlike initialisms.
In the context of healthcare, NHS stands for National Health Service, which is the publicly funded healthcare system in the United Kingdom

So is the NHS a mneumonic or an acronym?
... repeats answer above.

Is the NHS a mneumonic?... repeats acronym answer above.

So to my eye, this isn't really understanding in any real sense.

Although maybe after a few questions, the bot realized I was talking about the National Health Service (NHS) acronym and not a potential mneumonic?

Who knows.


I asked the same question twice at different points and got two different answers, so it must be following the thread. It's all a bit bizarre though, isn't it?

I notice that ChatGPT does something slightly different and 'apologises for the misunderstanding in the previous answer'. Aw :) The future of civility starts with our robots.

So let's dive a bit deeper and understand what is behind this remarkable system of question and answers that's going to replace us all at work and become Skynet.


NLP and LLM

NLP and LLM are often used interchangeably, that's not quite right - so here's a quick breakdown below so we all know what we're talking about:


Natural Language Processing (NLP) 

NLP is the broader field of artificial intelligence that encompasses the aim of understanding, interpreting, and generating human language.

Within NLP are tools such as speech recognition (Think Alexa) or Natural Language Generation (Think Chatbots).

There are a lot of tools in the trade to create these things, if you search around the methods of such madness you'll come across tools such as text classification, sentiment analysis, machine translation, text mining, text summarization, etc.


Large Language Models (LLMs) 

LLMs are a model build from a specific type of neural network that can process and generate natural language text. The one most people will be familiar with is Chat GPT as the original game changer.

All the big players like Facebook, Microsoft, and Google have their own rendition.

(LLaMA, Co Pilot, Bard)


Much like Google - one day it wasn't there, and then suddenly, it was everywhere.

For many, again much like Google, once you've found it and started using it - you can't seem to do without it. I am certainly in that box now. I can hardly write an e-mail without running it through my AI bot these days :D

'How are you today?' - Hmm, How AM I feeling today?! 
Not sure. Let's check with the co-pilot. 
Hmm, I'm okay apparently. 'I'm okay thanks'.

So where did it come from? A quick segue into the history and how in the world we went from a Google search bar to an all-knowing chatbot machine of awesomeness and power in the last couple of years?


The rise of the TRANSFORMERS

The change / new chatbot revolution has been brought about by a change in technology. Specifically, the transformer neural network architecture.

LLMs are built (using this transformer architecture) by finding relationships, patterns, and semantics present in the text data allowing it to generate coherent and contextually appropriate responses to new questions.


Image:

This is the actual and real transformer architecture.



I knew Octomus Prime was real. 80's hero.


LLMs are trained on huge amounts of data - think of the internet as big.

The training part is complex and I'll delve into that in a bot more detail below. (Hah pun :))


If you look up the various aforementioned LLM chatbots, they are slowly getting more impressive and boasting about something called their number of parameters.

Seemingly as a measure of their individual awesomeness, GPT 4 is boasting of about 175 billion parameters. Brilliant. ... So what's a parameter?


Parameters (a.k.a - The chatbot exhibition of egos / urinating contest)

As one might expect, the number of parameters in a Large Language Model (LLM) is a crucial factor that can significantly impact its capabilities.


Parameters are the internal weights and biases that the model adjusts to make predictions and generate responses based on the input it receives.

In general, having more parameters allows the model to capture and represent a more complex set of patterns, relationships, and nuances present in the training data.


Although it's not a straightforward sum of more parameters means a better chatbot.

There are other factors at bay, like model design, training data quality, fine-tuning, computational resources, and evaluation metrics.

There will be diminishing returns beyond a certain point, and the optimal number of parameters depends on the complexity of the tasks the model is designed to handle and the available computational resources.


No point in spending a few billion dollars on a trillion-parameter super LLM bot if you're training it on Fox News and Wikipedia articles when it is needed for answering quantum physics questions.


That being said, here are the general implications of an LLM having more parameters:

Increased Capacity More parameters provide the model with a larger capacity to store and remember information. This can result in a better ability to memorize details from the training data, enabling the model to perform well on a wide range of tasks.

Enhanced Expressiveness 

A higher number of parameters allows the model to express more intricate and nuanced relationships between words and concepts. This can lead to more sophisticated language understanding and generation capabilities.

Improved Generalization

While having too many parameters can risk overfitting (memorizing the training data without generalizing well to new data), when appropriately regularized and trained, a model with more parameters may be better at generalizing to diverse inputs and tasks.


Better Performance on Specific Tasks

Models with more parameters may perform exceptionally well on specific tasks that require a deep understanding of context, such as language translation, summarization, and question-answering.

Higher Computational Requirements

Training and using models with more parameters require more computational resources, including larger datasets, longer training times, and increased memory.

This can impact the practicality of deploying and using such models.


If you're interested in the technical side of building these LLMs and how the parameters are trained, skip yourself down to the technical bit at the bottom of the post. For now, however, we can look at the application of some of these technologies in the NHS.


LLM and the NHS

With AI being hailed as the savior of all things, NHS included. It's probably worth looking at a couple of specific applications of the technology on it.




Chatbots

Chatbots, It's hard to think of a business or customer service situation that doesn't use a chatbot to some extent these days. Recently returned a product to Amazon?

The minute you want some information about your purchase, basic troubleshooting, returns, or feedback, it's often largely driven by a relatively successful AI chatbot.

Given that the NHS is essentially service provision, it's not hard to imagine how such bots could be put to good use...

From information giving and personalised medicine advice, to actual medical care - if a human could provide this service by typing it, no reason an AI chatbot couldn't do the same. Probably better, or at least more accurately.


For example, I often get asked by friends and family on the phone about medical problems (I think all of us in the profession can attest to this, it's a bit of an honour to be asked, and I love to help out) but when I get asked about a rash on a newborn - as an oncologist who hasn't done A&E medicine for about 6 years - I'm giving advice with slightly wide eyes hoping that it's correct... Maybe the future of this isn't the well-meaning friend who happens to be a medic, but a well-trained pediatrician AI bot who can look at your phone WhatsApp images without mumbling 'bloody hell learn to focus a picture' under its breath.


Coming back to my initial comment about the delays throughout the service, it's hard not to think that maybe a chatbot intervention could actually take pressure off the service enough so that those who really need to speak to a human, can. Cue PhD Music.


In fact, this exact idea is being trialled with a mental health chatbot called 'Wysa' -


'Wysa provides a model that emphasises prevention and symptom management in addition to treatment. From cognitive-behavioural techniques (CBT) and meditation to breathing exercises, yoga, and motivational interviewing, Wysa has an extensive library of on-demand resources that help patients manage their mental health.'

Sounds great doesn't it!


Whether patients will be up for it knowing that it's a robot is yet to be seen, but I think they will. We don't need a human to verify what we Google to know if it's correct-(ish).


All the wonders of potential aside, there is understandable fear about using any kind of generative AI in healthcare. The potential for error, hallucination, and bias is a mega risk. Because unlike buying a new TV and getting the wrong measurements from the bot, the errors have much further reaching consequences.


For example, you don't want Wysa to be chatting away with someone who is really struggling with their mental health, to turn around around and say:

Patient: 
"Blah blah blah, and then this happened, blah blah blah, and now I'm sad all the time!" 

AI (Not Wysa but potentially Wysa) chatbot:
"Bloody hell mate, that sounds awful!. Life is really rubbish for you right now. Fair enough if you want to end it. Can't argue with that!"

Tongue-in-cheek comment aside, you can see how such a technology poses a real and substantial risk if it gets it wrong.


This is part of the reason we are calling out for proper pilots and prospective randomised controlled trials for such systems, we need to prove these systems are safe and effective.

Easier said than done though, especially in a cash-strapped system where such innovative interventions are designed from the outside in, and try to be shoehorned in.

Destined to fail at the point of implementation.

So we are in a bit of a stalemate, particularly in a system like the NHS, which is fearful of such changes at the best of times.


It's almost like someone needs to build a useful, generative AI system from within the NHS, prove it is safe and effective from pilot to RCT, and create a blueprint for other innovations going forward. Cue PhD music again.


Electronic Health Record Summary

So much time as a doctor is spent gathering information from patients' notes and records.

Summarising the patient's stay, looking up the journey so far before the clinic, getting blood tests, imaging, and observations, there is a massive amount of information to ingest before even seeing a patient. This has so much potential for AI it is screaming to be automated. And people are trying. It's not without its own challenges though.


Medicine has its own language. Rooted in the Greek and Latin origins, sentences such as 'the patient is dyspnoeic secondary to their bronchoconstriction and spasm' rather than 'they are having an asthma attack' are commonplace. Med School was fun :D


This language of medical data is clearly its own beast, and doesn't sit inside the realm of the typical day-to-day vernacular. So when building a chatbot, particularly one you want to read medical notes to give a summary.

The nomenclature won't necessarily be revealed through standard LLM training without getting a bit specialist. Training these AI models needs large amounts of data, and perhaps the way to do it is by using patient notes.


Sounds like the obvious dream scenario right?! We have millions of patients, and millions of records, let's just feed it all into our LLM and summarise get summaries.

Not so easy. Half the notes are scanned in pieces of paper with handwritten notes.

(As we all know, doctors' handwriting is totally illegible even by the most advanced human, AI doesn't have a hope lol). The letters may not be in one place if a person has moved around, there are different electronic health systems to be navigated, and I haven't even gotten started on the rules and regulations around data protection / anonymization / information governance.

When it comes to the NHS and the UK, rightly or wrongly, we have to jump through an exceptional number of hoops to access the data, then it would need to be cleaned and structured into a useable format for any kind of machine learning.


You can see why there might be challenges here.  Cue PhD music again.


All the problems aside though, the potential is massive and there are ways of mitigating the risks. One example of a technique to help guide the chatbots to the right answers is Retrieval-Augmented Generation (RAG)...


A technique used in natural language processing that combines elements of both retrieval-based and generation-based approaches to enhance the quality and relevance of the generated text.

To make medical health chatbots safer, RAG can be used to retrieve information from specialized databases or documents, making the chatbot more effective in its specific context.

The chatbots can connect to external documents and ground their responses on these documents. RAG can also produce document citations in its responses, which is a mega win for those generative responses needing to provide accurate information.


One exciting tutorial that I'll be doing here is from Deepset Haystack, using a RAG pipeline to combine PubMed data with an LLM: Check it out:


Now extrapolating this idea, let's imagine you're a patient with a disease - and you want to ask questions about it. Without going on Google or getting a random selection of results thrown at you by a general LLM, maybe the future holds a chatbot that is specific to you, your disease, your pathology, your co-morbidities. Whilst that won't be for every disease, cancer is something that lends itself to that very well.

Cue the PhD music Again.



Summary and Closing remarks (Pre-technical bit)

The potential for the use of LLM is massive in the NHS and for chatbots and EHR specifically, it could be a game changer.

We just need to get patient data ethically, and legally with consent where required, we need to anonymise it, clean it, label it, hope it is elligble, so we can train our models with it...

Then test those models to make sure they're safe, then prove they give a benefit with a well designed trial, then find a way to roll them out to the wider NHS - A system that doesn't really like change and has been using the same service model since its inception in 1948. Easy! :D

Cue PhD Music Again.


Thanks for listening.


Now for the keeny beanies out there... A bit of a chat on The technical bit:

Back to our LLM creation.


Creating The Parameters

Parameters aren't made, but learnt.

During training, the model goes through a process called optimization, where it learns the optimal values for its parameters by adjusting them iteratively to minimize the difference between its predictions and the actual target values (supervised learning) or by maximizing a reward signal (reinforcement learning). You may have heard me babbling on about this before: #Minimizing the loss function.

Honestly, you could reduce the whole of AI down to that one sentence.

Starting a hashtag trend.


This building of LLM is a pretty complex, maths, and computationally heavy business.

Just in processing power alone, you would need thousands of optimised GPU's to build one.

A back of the envelope calculation, chat GPT 4.0 used about 1.287 gigawatt hours, the same as running 120 US homes for a year to create it.

(So this isn't something we'll be building from our laptops team ;))....


Fortunately, we can take a skyscraper view to get the basics under our belt.





Skyscraper view of how an LLM is built :


For those of you who did my building a neural network tutorial - a lot of these ideas will be familiar - the machine learning process is essentially the same.


A bit about the transformer architecture

The transformer architecture is based on something called the attention mechanism.

The attention mechanism allows the model to focus on the most relevant parts of the input sequence when generating the output sequence.


It's based on the idea of assigning weights to different parts of the input sequence based on their relevance to the output sequence.


These weights are learned during training and are used to compute a weighted sum of the input sequence, which is then used to generate the output sequence.

The weights are computed using a softmax function, which you may remember us messing around with the softmax function when we built our first NN together :) Ah, memories.


Encoder - Decoder Structure

The transformer architecture follows an encoder-decoder structure.

What does this mean?


A good example of this in action is translation.

Input: "I am a student."
Encoder: [0.2, 0.5, 0.1, 0.3]
Decoder: "Je suis étudiant."

The encoder, which is just a sequence of numbers to us, is worked out by the transformer architecture.

The encoder is a stack of layers that does clever stuff with the input, and that collection of numbers represents a vector that captures all the relevant information from the input.

Pretty awesome eh?


It then passes this information to the decoder to generate the output.

The decoder uses the attention mechanism, to focus on the most important parts of the input.

The decoder is auto-regressive, meaning that it reads the previously generated words as additional input when generating the next word.

Basically making sure that the sentence makes sense.


Training the transformer

Talking a bit more about the encoder and a decoder, each of which contains multiple layers of self-attention and feedforward neural networks.

The encoder processes the input text, while the decoder generates the output text.

The self-attention mechanism allows the model to attend to different parts of the input text at different stages of processing, enabling it to capture long-range dependencies and contextual information.

The feedforward neural networks are used to transform the output of the self-attention layers into a form that can be used by the next layer.


Let's go through the process of training from here:


Initialization 

The model's parameters are initialized with random values. These parameters include weights and biases associated with the connections between neurons in the neural network.


Forward Pass 

During the forward pass, input data is passed through the neural network, and the model generates predictions. For language models, this involves processing input sequences to predict the next word or generate a response.


Loss Calculation 

The predictions are compared to the actual target values (ground truth), and a loss is calculated. To illustrate the ground truth here, (and how chat GPT started off) we can think of the the LLM predicting the next work in the sequence.

We might give a ground truth as below:

Input Sequence: "The cat sat on the"

Ground Truth: "mat" During training, the model processes the input sequence and generates a probability distribution over the vocabulary for the next word. The ground truth, in this case, is the correct word that should follow the input according to the training data.

Input to Model: "The cat sat on the"

Model Prediction: 

Probability distribution over the vocabulary (e.g., "mat": 0.6, "rug": 0.3, "floor": 0.1).

Ground Truth: "mat"

Loss Calculation:  The model's prediction is compared to the ground truth using a loss function

The loss quantifies the difference between the predicted values and the true values.

As I've mentioned before, the goal of optimization is to minimize this loss.


Backward Pass (Backpropagation of errors)

Like the forward Pass, but in the other direction ;)

Again, the key idea behind backpropagation is to iteratively adjust the weights and biases of the neural network in the that reduces the error or loss.


The gradients calculated during the backward pass guide these adjustments. The process is repeated until the model converges to a state where further updates do not significantly reduce the loss.


Update Parameters

The optimization algorithm uses the calculated gradients to update the model's parameters.


Iteration 

These steps are repeated for multiple iterations over the entire training dataset, these repetitions are called epochs.

Each iteration refines the model's parameters, gradually reducing the loss.


Convergence 

The optimization process continues until the model reaches a point where further adjustments to the parameters do not significantly reduce the loss. At this point, the model is considered to have converged.


Summary

This is a complicated area of machine learning, and it's difficult to convey a surface of information without going into arbitrarily deeper levels to gain further understanding.


The best way to learn such things is to build them yourself, so next blog will be a tutorial on the RAG chatbot via the awesome Deepset Haystack framework.

Looking forward to seeing you there. Peace!


Cue PhD Music!....








Comentários


bottom of page