Privacy by design: ensuring safety with secure internal AI models for LLMs

‍

As artificial intelligence (AI) continues to advance, large language models (LLMs) like GPT-4 have become vital tools across various sectors, including healthcare, finance, and public administration. These models help increase efficiency by automating tasks that humans usually do, improving decision-making, and provide valuable insights.

At Invision360, we recognise the potential of these models to revolutionise how local authorities deliver Special Educational Needs and Disabilities (SEND) services.

However, we recognise that LLMs and AI are not inherently risk-free and are making use of the latest innovations to design safe and responsible technology so that we can work with LAs to improve outcomes for children and young people with SEND.

In this article, we explore the differences between publicly and privately hosted LLMs, evaluating the risks to local authorities and children and young people’s data.

Discover how secure internal AI models offer significant privacy advantages, ensuring data protection and enhancing service delivery.

Publicly vs. Privately hosted LLMs

When it comes to managing privacy and security risks in LLMs, choosing between public vs private hosting plays a pivotal role. We'll start by exploring the differentiation between the two.

Publicly hosted LLMs

Publicly-hosted LLMs and related services, such as Chat-GPT, are managed by third-party providers and accessed via cloud services. With these services, controlling data or privacy risks is challenging because the providers of the services set the terms of use and privacy policies. With publicly hosted LLMs, users have limited control over how their data is used, how the LLM might utilise their prompts in the future, and who can access the data (either now or in the future). This uncertainty makes it risky, in particular, to use sensitive data with publicly hosted LLMs.

Privately hosted LLMs

Privately-hosted LLMs are deployed on an organisation's infrastructure or a private cloud. This setup gives organisations greater control over the model and the data it processes. With privately-hosted LLMs, data stays within a controlled environment, through a combination of technical and policy controls that can meet the specific security and regulatory obligations on local authorities.

Understanding the risks of using publicly hosted LLMs

Overall, use of publicly-hosted LLMs pose greater privacy risks for local authorities in comparison to privately-hosted LLMs. When using a publicly hosted LLM, the data you input, and any outputs produced are stored on an external server controlled by a third-party provider (Open-AI in the case of Chat-GPT; Anthropic in the case of Claude 3). This means that local authorities do not have direct control over how their data is handled, stored, and secured.

In the case of publicly-hosted LLMs, the security of local authority data depends on the third-party provider's security measures. Any vulnerabilities or breaches in their systems could expose sensitive data to unauthorised access or leaks.

At Invision360, we've designed our security measures in partnership with local authority technical teams, to meet the rigorous compliance and security standards that come with processing local authority and SEND data.

There can also be risks that third-party providers have policies that allow them to use, analyse, or share data that local authorities enter into their products with other entities. Any policies that a local authority signs up to, can often be changed unexpectedly or updated at a later date. This can increase the risk of data being used in ways the local authorities did not intend or expect, or not in line with regulatory obligations.

Data leakage

Data leakage occurs when sensitive information processed by a model is unintentionally exposed. For example, if an LLM is used to process children and young people’s medical reports, it might inadvertently share private medical information.

Inference attacks

Inference attacks can happen when an attacker exploits an LLM to extract sensitive information. This could involve piecing together data points to reveal private details, leveraging the model’s ability to generate contextually relevant responses.

Unintended outputs

Unintended outputs are when an LLM generates text that deviates from the intended purpose, for example, producing answers that contain highly technical information when the aim is to communicate to someone with English as an additional language. This can occur due to biases in the training data, limitations in the model's understanding, or unexpected interactions with input data.

Model updates and data handling

Regular updates are essential to improve LLM performance and security. However, updating models can introduce new privacy risks if the training data is not handled securely. Proper anonymisation and data security during updates are crucial.

The distinctive advantages of using a privately hosted LLM

Privately-hosted LLMs, like VITA by Invision360, mitigate these risks by enabling the owner to have direct control over data handling, security, and compliance practices.

Using a privately hosted LLM enables you to meet the highest standards of safety security and compliance in the following ways:

Deployment of private cloud infrastructure: The platform will be deployed on a cloud infrastructure that is dedicated to a single local authority.
Control over data: Ensure a partnership between both parties to design and implement appropriate data and security controls, including encryption, access controls, and network security.
Regulatory compliance: Working together, platform developers and local authority Information Governance teams can ensure compliance with relevant data protection regulations (e.g., GDPR, DPA).
Enhanced privacy: By hosting the LLM on private infrastructure, this ensures that sensitive or confidential information is not ever exposed to external organisations, and our contracts give local authorities a long-term guarantee.

Let's use VITA by Invision360 as an example:

In addition to the standard advantages of using private hosting, with VITA Invision360's in-house technical team have implemented some additional controls that ensure specific risks associated with LLMs are reduced:

Privacy by design

Privacy by design is a framework that embeds privacy into the design and operation of technology systems, throughout the entire lifecycle of a product. Invision360 has been working with local authorities to design appropriate controls from initial conception of VITA by Invision360 to deployment, ensuring it meets rigorous standards for safety and responsible use of AI. These include:

LLM model isolation

As part of our privacy by design approach, VITA by Invision360 uses pre-trained models only, which means no Children or young people’s data is used in training the model and no children or young people’s data is ever fed back into the model. This eliminates the risk of data leakage.

Human-in-the-Loop AI

With any AI system, the best outcomes are when AI enhances an expert’s existing knowledge. A privately-hosted LLM can be designed to complement, not replace, the expertise of SEND caseworkers. VITA by Invision360 incorporates human review steps at multiple stages of the process, to ensure outputs meet high standards and EHCP’s are tailored to each Children or young people’s needs.

Testing and validation

VITA by Invision360 goes through rigorous testing and validation processes before deployment (going live to local authorities). Invision360 has in-house expertise in AI safety and has designed a set of adversarial testing protocols (where we try to ‘break’ the model). We do this alongside ongoing evaluation with real-world users to ensure the outputs behave as expected and deliver high quality outcomes for children or young people. Between all of these processes, we are able to identify and rectify any potential biases and ensure safe and high quality outputs.

Explainable and transparent AI

We believe that safe and responsible use of AI can be transformational for local authorities. To support local authorities to build trust in the systems they are using, VITA by Invision360 includes several mechanisms to ensure transparency and safety:

VITA by Invision360 checks the quality of caseworker advice and matches it to the correct children or young people's details.
Caseworkers can edit and must approve any draft text generated by VITA.
VITA by Invision360 includes a ‘model explainability’ section to provide local authorities with insights into how the model works.

Conclusion

While publicly hosted models offer convenience for local authorities, they also pose challenges in terms of data control and privacy management. While LLMs hold promise for revolutionising service delivery and improving outcomes for children or young people with SEND, managing risks around this sensitive data is crucial.

Privately hosted LLMs, like VITA, built within isolated infrastructure, offer a compelling solution to this challenge by giving local authorities autonomy and control over data processing, security and compliance standards.

By embracing LLMs alongside robust privacy measures, we can drive transformative advancements in service delivery while upholding data integrity.

New to LLMs? Find out how they work and some of the key ways they can help LAs in our recent blog post 'Harnessing generative AI for EHCPs: A deep dive into Large Language Models'

‍

Lydia Polom

Marketing Manager

Privacy by design: ensuring safety with secure internal AI models for LLMs

Privacy by design: ensuring safety with secure internal AI models for LLMs