On Software Localisation (particularly with relevance to medical AI)

After I posted about my experience at the IET's recent Medical Devices + Data: Future-proofing Health & Care event in Cambridge, I was asked a question here on how software localisation fits into the V&V and regulatory workflow for AI (Artificial Intelligence) SaMD (Software as a Medical Device) products. I'm going to assume that we're talking primarily about linguistic and cultural localisation of software UI here, although there are other aspects which I'll touch on. My answer is fairly brief (I'll put it in italics here) but the ensuing mental tangent that follows, on how to deal with localisation when approaching regulatory submission, was interesting enough to put here.

My answer:

"I should say that software localisation was not discussed during the workshop and that Usability as a whole was not a particular focus.

However, my take is that while there appears to be a promising drift toward harmonisation between the various regulatory regimes, I feel that localisation is one of the areas where this is least likely to be fully achieved. Demonstrating usability in the face of linguistic and cultural differences is an area in which I feel it is reasonable for regulators to continue to demand individual treatment, and while standards may be harmonised release and verification cannot both be simplified without making corrective UI changes more difficult. Localisation remains an additional complication to V&V that must be handled by careful configuration control and architecural care.

While I'm primarily talking about localisation of language here, I think this applies just as much to localisation of AI training, if such a thing is necessary."

And now off into the swirling mental vortex... How do you deal with this? What can you do to reduce the burden of localisation when you develop your AI software system?

Avoid language at all costs

First, I would always suggest that effort is better directed at coming up with a language-agnostic UI rather than baking language into software. It is worth working to expunge that last text label in favour of an icon and an explanation in the manual (although note that de-novo icons will tend to fall through to usability validation in any case, carrying some risk with them, so try to use internationally-recognised symbols as far as possible).

Of course, sometimes localised software is the solution for the even worse problem of localised hardware - with the falling cost of deployment of rich software UI, software provides a relatively easy path to getting localised UI onto medical devices that might otherwise require physical configuration changes for different markets. If you can't get away with labelling your buttons with symbols, making them soft buttons at least means you only need one mechanical configuration in manufacture.

Architectural partitioning - is your UI really critical?

If your UI is too complex to be conveyed without words on the UI itself, a few options should be considered. The first is whether the UI carries the same safety classification as the underlying functionality. In medical mobile apps, for example, a key medical function will often be hidden beneath incidental, patient-visible functionality provided only to keep the patient in touch with the process, inspire confidence and provide a compelling reason to install the app and keep it up to date.

For example, lets imagine a phone app to allow day-to-day AI control of an active implant. Here's an initial architecture:

Here's the app functionality:

  1. Transfer measured physiological data from the implant via Bluetooth*,

  2. Filter the data,

  3. Store the data remotely for viewing by a clinician*,

  4. Use AI to determine ideal implant program based on data and settings from clinician,

  5. Adjust implant settings via Bluetooth,*

  6. Display implant status to the patient,

  7. Display notification of up-coming appointments to the patient,

  8. Provide easy access to generic advice on the patient's condition.

1-5 of these are probably medically critical, while 6-8 are likely not to be. Immediately, we can partition the application into two separate parts with differing levels of concern. Let's add the non-medical items in blue:

In fact, there does not need to be any link between the medical and non-medical at all within the phone itself - everything can be handled by transaction with remote services. With this separation, it should become easier to maintain the localised part of the phone software with reduced regulatory scrutiny while leaving the critical part well alone, as they can be completely divorced binaries:

In fact, a further question exists over the status of functions 1, 3 and 5, which are likely to be pure data transfer operations and so also attract a lower level of scrutiny than functions 2 and 4. Additionally, putting the AI in the cloud makes it easier to maintain centrally, and to roll out changes with no app updates required provided the implant interface has not changed (which would be difficult since it's stuck inside the patient). So we end up with this:

Nope, can't do that - I still need localised software

If none of that helps and your software still requires localisation and regulatory acceptance, my gut feel is that it is better to localise at build time and keep the variants distinct, such that changes to a single language do not muddy the water for global deployment. However, determining how V&V and submission should work is a question for a regulatory expert or something I'd approach a notified body or the FDA to discuss early on while determining the architecture.

If your regulatory representative can agree that retesting of common core functionality does not need to be repeated for linguistically-different binaries, great. If not, you may need to look at the testing burden balanced against localised UI complexity - it might be better to roll all your languages in at build time and configure at run-time, such that you only need to verify the core once but accept that making changes to a single language is going to be mean reverifying the whole thing. If anyone can provide insight on this (or sees anything wrong with anything I've written above), I'd love to hear it. Otherwise, I'll research it further at some point.