Easy wins: Part One

The NHS could use some easy wins.

Today I'm going to walk through a process where I found a problem, used some code to automate it, and saved consultants a bucketload of time and frustration.

This post is about what we can do as coders to bring marginal gains to the NHS, improving things 1% at a time.

For most of us in the NHS we see lots of little things that could be easily fixed on our daily rounds.

They are little things, not big enough to draw the attention of those in the business of fixing problems, nor are they problems that would ever be 'gotten to' on a system level, as let's face it, there are bigger fish to fry.

But maybe, just maybe, we can fix some of these problems with our own homegrown solutions.

(Without needing to enter the labyrinth-esque red tape that surrounds the goblin king that is innovation in the NHS)

We can build something, use it, and thanks to the principle of marginal gains, make the NHS a little bit better - just 1 % at a time.

If enough of us take this idea on, there are a lot of gains to be made.

Side note:

Coming up with a way to fix these problems won't turn you into a venture capital-backed founder.

This is handy if you don't want to give it all up to go on the breadline, throwing in with the countless other health tech companies out there desperate to milk from the teet of the government and its promise of fixing the NHS with AI and buckets of cash.

And right now I can promise you, you're not going to make any money out of it.

But, you'll probably save some people some time, make life a bit easier, and get some coding joy out of it too.

(As an added bonus, for those in training posts I'm sure it will qualify as a quality improvement project too, hell maybe even a poster).

So today let's talk about an easy win I just finished for clinic prep: How it came to pass, where the opportunity arose, and for those of you who are keen, how I coded it.

What you'll need for today's exercise :

A compiler of some sort installed at work. There are a lot of options, but a lot of them you won't be allowed to install. I asked our I.T. team if they would be willing to grant me admin rights to install Python. They said yes! Win! :D There are lots of good reasons to use Python, like R, it can be used for a lot of statistical analysis and has the added benefit of being useful for machine learning analysis. Failing Python, PowerShell, or maybe Java? Have a conversation with IT and explain your rationale. Whatever you feel you can confidently code in and have a chance to actually make something work within a rights-constricted environment without doing some dodgy hacking.
A problem to solve. (or a time-saving idea that can be made without messing with current I.T. infrastructure) Problems will not be difficult to find, the creative part might be thinking about how you can work up a solution with minimal change in effort from the end users, and how you can draw useful information from the EHR and associated systems.
A few personal hours to kill and a willing user to test your time-saving idea Programming takes time. So does testing. No useful program in the history of programs has ever worked first time. So build testing time in, with the end user involved, and don't overpromise ease or capability (I added the overpromise line because I do that all the time :))

Let's crack into it.

The Clinic Prep Problem:

Clinic prep is a process that takes a lot of time.

Time that is almost never built into anyone's schedule, and therefore is almost always done exclusively in one's own time.

So what is clinic prep?

Let's walk through the problem we're dealing with today.

You open up the list of patients that are coming to Monday's clinic, and let's say you have 35 patients on the list.

You can't just turn up to that blind.

You've got a window of time to see the patient, and if you had to spend 5-10 minutes in clinic working out why they are there, where they were on their cancer journey, looking up CT results, blood's, ect. and making a plan on the fly... You'd run hours late every time.

Bottom line, for big clinics you need to do the prep.

One common approach is to grab the list from our clinic program (TRAK) and go through each patient and put together a summary of what's happening.

This is usually a Word document with the pertinent details and a running summary with relevant updates at each appointment.

This isn't done from fresh each time, the majority of our patients, especially in the bigger follow-up clinics are ones we've met before, are on treatment or surveillance and we have previously made a summary to know their background.

We can then top up the summary with the most up-to-date results before seeing them in the clinic.

These patient summaries are stored in a directory of many, many weeks of Word documents.

So ... what's the problem?

Getting the clinic list, going through each patient, and looking up their most recent entry from the Word documents available takes time.

You're searching for the patient in a directory of previous clinic lists, finding their most recent summary, and putting it into a new Word document next to the appropriate clinic details. Even if you're a slick IT whizz, this would take you around 30 seconds per record.

Many of my colleagues to their credit are much better doctors than IT guru's, so it might take a minute or more per record.

By the time you've got 30 patients on a list, that's up to 30 minutes before you've even got started on updating it.

This problem is agony for me, it's something that is screaming to be automated. (And honestly, should have been built into the EHR from the start, and yadda yadda yadda ... same old rant about poor user involvement from the start). So without further whinging as per the title today, it wasn't. So let's fix it!

Before embarking on any kind of easy win, it's worth doing a back-of-the-envelope 'calculation'.

Calculate what?

Is it possible?

Will it save time?

How much time?

Will people use it?

Don't waste your time on impossible fixes, or those no one will use because you have to do half as much work to use the new system change.

People are lazy creatures of habit. It needs to be easy and the path of least resistance.

So let's do the basic questions for the clinic prep problem:

The clinic Prepper: Is it Possible?

Yes. It will need to use a couple of libraries that access office documents like Excel and Word, but this should be as easy as peanuts to automate.

Will it save time, is it cost-effective?

Let's be very conservatively quantitative about this.

There are about 50 consultants at the Beatson.

Maybe half of them use this method for their clinic prep. (Ignoring us registrars ;))

Let's say of those 25, half go on to actually use the program, the other half are happy with their tried and tested method.

So we are talking about 12ish people now.

Let's say each clinic has 30 people in with an average of 30 seconds per record search, copy, and paste.

So We've got 15 minutes per clinic, and to keep it simple, and conservative, let's say this is only used for the bigger weekly clinics, 1 a week.

12 Consultants * 15 minutes = 3 Hours a week .

Over a year that is 156

hours wasted copying and pasting into Word documents.

At the risk of causing a storm, to quantify in cash dollars, I thought I'd use an average salary figure for UK oncology consultants.

The top Google result was £48.08 per hour.

Link included - don't blame the messenger.

So what are we looking at then? £48.08 * 156 hours = £7,500.48

Seven and a half grand a year for copying and pasting!

This can't happen soon enough.

Summary

It's a good idea to fix it.

It can be done. People might use it, and most probably will as word spreads.

And it will save a relative ton of cash. So let's get cracking with the approach.

This bit gets techy, so this next chunk may not be of interest to y'all, But for those who are code-curious, I go into plenty of detail for accessibility :) Grab some popcorn.

For those of you ditching at this stage, stay tuned thanks for visiting.

It's coding time!

The approach:

One of the key parts of system analysis and designing a program is fully understanding the process.

Everything done needs to be explicitly stated and mapped, computers are as dumb as the instructions you give them. They can't guess anything.

Therefore before even turning on a computer, you need to sit with the person who does the process you want to automate and go step by step through it.

You are mapping their actions to the computer. Any misunderstanding here, will make the program completely pointless. Always remember that.

The process:

Currently, this is what happens:

Manual information retrieval from EHR:

Open the clinic list for the respective date on TRAK.

Copy and paste the time and Community Health Index (CHI) number and put them into the Word document or Excel spreadsheet.

(This is done one line at a time to avoid extraneous information being copied out of TRAK).

For Each Community Health Index (CHI) number:

Copy the CHI number, and search in a folder containing all the previously prepped lists for that number:

From the returned list, open the document with the most recent entry.

Note - All word documents are saved with the date as a post fix.

From this document, search again for the CHI number and copy the information under it.

Paste the information into the new Word document next to the clinic time and CHI number.

Repeat steps 3 to 5 until every patient has been updated into the list.

Manual updating process:

Go through the list and amend it with any updates from patient results (This part will not be automated as requires intelligence)

Conclusion:

So This to me is an easy win.

I was excited to build it because I can really see the value in saving a consultant performing this mind-numbing task. I also thought it would be very handy to see when the patient was last seen, this is my own sugar on top that should be easy to glean from the date of the letter.

Coding:

Start with a bit of Pseudo Code

Even though this program sounds really simple, it's amazing how quickly things grow as you deal with new and unexpected challenges. (And this one was no different.)

When you're building bigger things, the approach to coding is to sit down and draw it out first, plan how things will work together, and go from there.

With something where we have a relatively straightforward plan, you can do that with pseudo-code just to lay out your thought process.

I've always been a fan of having a module that is the brain of the operation to plan out my code and then build modules from there.

So let's get started with our first module - program logic:

program_logic.py
# Input clinic time and CHI numbers
# organise list of documents into chronological order
# take CHI number and search through documents for CHI
# When found, stop looking for that CHI, save the result to a new word document in clinic time order and move onto the next CHI
# Finish the program and let the user know if it worked or failed

I haven't discussed building a GUI at this stage, this is intentional.

I built the GUI at the end.

Spending too much time on a GUI when the underlying logic of a program is shoddy is something I've been guilty of in bigger projects in the past.

More than once. And It's a real shame.

No-one cares how pretty it is if it doesn't work.

For this program, it needs to be functional and accurate.

Frankly, it could be run from a command line.

(although I'm not going to ask non-computer literate oncology consultants to run Python command strings :))

A GUI is definitely going to be needed, a very simple one. But definitely something to come back to when the rest is working well, rather than dedicate time to it at the start.

Early design and functionality decisions

I've had to make some decisions about functionality first off.

The format that the TIME / CHI comes out from TRAK lends itself to being pasted directly into an EXCEL sheet rather than Word.

On discussion with the end user, this would be the preferred method of feeding the program the information, so rather than start with a Word document with a few details to build out, the cleaner (and likely easier) solution is to read in from Excel and create a new word document and have the clinic prep built directly into that sheet.

Here we go! :D

Part 1: Input clinic time and CHI numbers

program_logic.py
# Input clinic time and CHI numbers
# organise list of documents into chronological order
# take CHI number and search through documents for CHI
# When found, stop looking for that CHI, save the result to a new word document in clinic time order and move onto the next CHI
# Finish the program and let the user know if it worked or failed

As you can see with this code, I've created a clinic _prepper function that takes an input_excel_file and extracts the clinic times and CHI numbers.

# program_logic.py
import chi_finder

def clinic_prepper(input_excel_file):
    # Importing Excel file details
    print(f"Reading in File : {input_excel_file}")
    time_chi_List= get_chi_numbers(inputExcelFile)
    print(f"Generated time and CHI dictionary {timeAndCHIList}\\n")

So the first module to create is to get chi numbers and times out of Excel and return them to this clinic prepping function.

I've defined this function inside a module called chi_finder in case I need to expand this later on to contain other functions or logic.

I used the Excel library called 'Pandas' because I've used it before and it's really easy to use to get info out of Excel.

Here's my code below:

# chi_finder
import pandas as pd  # Excel file handler


# function to return a dicitonary of times and chi numbers from excel sheet
def get_chi_numbers(excelFile):
    print(f"Running Excel import from {excelFile}")
    clinic_file = pd.ExcelFile(excelFile)

    # Read in sheet 1 CHI numbers and time from appropriately named columns "Time and CHI"
    clinic_sheet = clinic_file.parse("Sheet1", converters={"Time": str, "CHI": str})

    # Drop the rows where 'CHI' is NaN
    clinic_sheet = clinic_sheet.dropna(subset=["CHI"])

    # save data from sheet as series
    appt_time = clinic_sheet["Time"]
    chi_number = clinic_sheet["CHI"]

    # remove '(b)' from time string
    appt_time = appt_time.str.replace("(B)", ":00")

    # make appt_time into time format from string and remove date
    appt_time = pd.to_datetime(appt_time, format="%H:%M:%S")
    appt_time = appt_time.dt.time

    # Convert the Series to a list
    appt_times = list(appt_time)
    chi_numbers = list(chi_number)

    # combine the lists together into a time/chi paired dictionary
    paired = zip(appt_times, chi_numbers)
    time_chi_dictionary = dict(paired)

    # return the time/chi dictionary
    print(time_chi_dictionary)
    return time_chi_dictionary

#test the program
if __name__ == "__main__":
    get_chi_numbers(r"clinics\Clinic Prepper\Clinic today.xls")

Hopefully, this is largely self-explanatory.

I'm pulling data from columns named "Time and CHI" respectively.

You'll notice that I've built in a mechanism to deal with non-numbers (NaN) for any empty clinic slots. This broke it to start with as I wasn't expecting any. Which is a good thing.

I've also dealt with a strange practice whereby overbooked time slots (or duplicated clinic time slots) have a (B) post-fix. (probably indicating double booked) - as I'm saving the time information as a date time variable this needs to be cleaned up or it will also break it.

This all worked really nicely, and I've zipped up my lists together into a time_chi_dictionary to then use to look through my word docs.

Data Structure and accessibility Thoughts

One of the immediate realisations at this stage is I'm going to have a relatively complex data structure. It's going to be a dictionary with 4 keys - [Time, CHI, Summary Text, Last Seen Date].

Perhaps an ideal way to store this could be a singleton class called 'clinic list'.

(A singleton meaning only one instance can be created and can be accessed, like a global variable and used throughout the program).

Therefore it is sensible to define my class now, and build the program around it using the class to store the data. I shall reveal this in part two as we develop more of the program.

On to part two!!

Part 2: Organise list of documents into chronological order

program_logic.py
# Input clinic time and CHI numbers
# organise list of documents into chronological order
# take CHI number and search through documents for CHI
# When found, stop looking for that CHI, save the result to a new word document in clinic time order and move onto the next CHI
# Finish the program and let the user know if it worked or failed

Next, we need to get the list of documents to read through and organise them chronologically so that when we search for the CHI, we get the most recent version and stop looking.

The first real problem:

One of the fun discoveries when working with the documents was that they were not all the same .docx format.

There was an arbitrary split between docx and doc files.

It took me a while to work out that our remote systems (Those used by us at home to prep the list) save in the newer format of 'docx'.

The desktop setup for those in the hospital use the older format 'doc'.

That's right people, we are still defaulting to a file format from over 20 years ago on our work systems. Good luck with the NHS AI PhD Sam - haha.

Therefore an additional step we need to do is to convert the older format to the newer one. doc to docx.

This will make the whole process consistent, and looking at the available libraries to create Word documents, will provide an easier path to success.

Function decision:

So another function decision at this step.

I need to tell the program where to look for the clinic Word documents that have been previously made.

After some back and forward, I thought the easiest way to make this work would be for users to literally drag the program folder into the directory where their files were stored.

They can then easily navigate to it and run it out of there with no need to specify any locations.

So I'm going to hard-code it to look into the parent directory for clinic files.

This can be changed in the future if needed with the GUI ect. but for now, let's see how it pans out.

Grab the files!

To make life easy, all the clinic files have a date post-fix.

With the user I was working there were a couple of other documents in the folder we didn't want to look at, so, fortunately, I was able to filter those out because they didn't have the same DDMMYY structure to the file name.

The function : get_documents_in_order(), is designed to find all files in the parent directory of the current script that have a date (in the format ‘ddmmyy’) in their filename.

It then sorts these files in reverse chronological order (most recent first) and returns a list of the full paths to these files.

#document_organiser.py retrieve documents from parent directory
import os
import re
from datetime import datetime

def get_documents_in_order():  
  # Get the absolute path of the current Python file
  current_file_path = os.path.abspath(__file__)
  
  # Extract the directory containing the current file
  current_directory = os.path.dirname(current_file_path)
  parent_directory = os.path.dirname(current_directory)
 
  # Regex pattern to match the date in the filename
  pattern = re.compile(r'\d{2}\d{2}\d{2}')

  # Get list of all files in the directory
  files = os.listdir(parent_directory)
 
  # Filter files based on whether they match the date pattern
  files_with_dates = [f for f in files if pattern.search(f)]

  # Sort the files by date
  files_with_dates.sort(key=lambda date: datetime.strptime(pattern.search(date).group(), '%d%m%y'),reverse=True)

  #combine path with filenames
  for i in range(len(files_with_dates)):
    files_with_dates[i] = os.path.join(parent_directory, files_with_dates[i])
    
  #print(f"output : {files_with_dates}")
  
  #testing
  print(f"Found {len(files_with_dates)} clinic files")

  return files_with_dates

 #test
if __name__ == "__main__":
  get_documents_in_order()

This worked really nicely too, but it is slightly more complicated, working with os and the directories can look a bit messy and be hard to follow.

Here's a bit of step-by-step:

current_file_path = os.path.abspath(__file__)

This line gets the absolute path of the current Python file.

current_directory = os.path.dirname(current_file_path)
parent_directory = os.path.dirname(current_directory)

These lines get the directory containing the current file and its parent directory.

pattern = re.compile(r'\d{2}\d{2}\d{2}')

This line uses the 're' module, which is all about regular expressions.

We use re for string manipulation - and in this case, we are defining a string pattern of 3*2 digits - for dates, so we can find files that have the date format ‘DDMMYY’.

You might be thinking, but why bother with all the 2's, why not just say 6 off the bat?! They'll both work. That is a good point! But I wouldn't do that, and here's why:

Both will match a sequence of six digits, so functionally they are equivalent.

However, from a readability and maintainability perspective, r'\d{2}\d{2}\d{2}' is clearly meant to represent a date in the format ‘DDMMYY’.

r'\d{6}' does not convey this grouping. It's just a random line of code.

I can see myself coming back to this and thinking why the hell did I want 6 digits here?

Coding is often about making your code as clear, readable, and maintainable as possible, not just about making it work.

It's a bit of art, a bit of science. Sort of like medicine :)

Making it beautiful and clear makes it easier and more interesting for others and yourself in the future to understand! So that's my two cents on it. Do as you will :D

Right, back into it!

files = os.listdir(parent_directory)

This line gets a list of all files in the parent directory.

files_with_dates = [f for f in files if pattern.search(f)]

This line filters the list of files to include only those that contain a date in their filename.

files_with_dates.sort(key=lambda date: datetime.strptime(pattern.search(date).group(), '%d%m%y'),reverse=True)

This line sorts the list of files by date in reverse order (most recent first).

It then uses the strptime function to convert the dates from strings to datetime objects for sorting.

for i in range(len(files_with_dates)):
    files_with_dates[i] = os.path.join(parent_directory, files_with_dates[i])

This loop prepends the parent directory to each filename to create a full path.

print(f"Found {len(files_with_dates)} clinic files")

This line prints the number of files found so we can check it got all the relevant files. (Obviously, comment out or strip these prints when your program is done to not clog up the log unless you need to).

return files_with_dates

Finally, the function returns the sorted list of file paths.

Awesome!

That's a pretty good win for today's session.

We've got the CHI's, we've got the files.

All that's left is:

Convert them over to the most recent file format.

Find the chi's in the documents

Strip out the text

Build a new document with the found text.

Okay, when you put it like that this may have to be a 3 part blog rather than 2. Hahaha :D

Thanks for reading, hope you've enjoyed it. Please feel free to comment and suggest things, everyday is a school day! :D

Have a brilliant weekend! See you next time.

Addenum: 
Just in case anyone is wondering why some components have underlines and some have capitals: In Python, the naming convention uses something called PEP 8 (Python’s style guide) - Quick guide:

Classes: Use CamelCase (also known as CapWords). 
Each word in the name starts with a capital letter, with no underscores between words. For example, MyClass.

Functions and Variables: Use snake_case. 
All letters are lowercase, with underscores between words. For example, my_function or my_variable.

I sometimes slip into old language bad habits with camel case being my preferred variable declaration (prob from my Java and Objective C days), so please forgive me for any inconsistency :))

Also, the code won't copy and paste cleanly from these blog posts due to indentation being lost. I'll put the program up on Github when I've finished the blog posts on it for forking. Enjoy! :D

Easy wins: Part One

Recent Posts

1 Comment