Last week Cloudflare and Meta announced that Llama 2 is now available in the Cloudflare Workers platform. This makes Llama 2 accessible for developers, opening up new opportunities to build LLM-augmented experiences directly into the application layer.
Over at Twilio Segment, we have been really excited about the opportunities edge workers can unlock for analytics instrumentation & real time personalization. Last year, we introduced our Edge SDK to help app builders collect high-quality first-party data and run advanced personalization & experimentation with user context.
This year, as we’ve developed CustomerAI, we’ve been learning from some of the most advanced engineering teams in the world on how they’re planning to leverage LLMs and AI to adapt the application experience.
We heard three themes, loud and clear. To deploy AI in the application, developers are going to have to tackle 3 new challenges to put LLMs into production:
-
LLMs need to operate closer to real time. Roundtrips and slow inference can break the customer experience - every millisecond counts!
-
LLMs lack customer and domain context. AI experiences need to know where a customer is in the journey, what they’re looking for, and how to properly care for a them.
-
LLMs struggle to “close the loop”. Dev and marketing teams need to understand how an AI experience impacts the overall customer journey (ex. conversion rate).
This is in addition to figuring out how to avoid hallucinations, model guardrails, and bake in customer trust and transparency – all areas that we’ve had to tackle in building Twilio CustomerAI.
In this post, we’re going to show you how to build a high-context, low-latency LLM app experience in just a little bit of code. We’ll combine Segment’s Edge SDK, Cloudflare Workers, and Llama-2, and talk through a couple of potential use cases you might explore on your own.
Ready to start building?
Meet Your Llama 🦙
Llama 2 is accessible in Cloudflare Workers, Node.js, Python, or from your favorite programming language via the REST API. You can see examples to help you set this up in the Cloudflare docs. What’s exciting about accessing your LLM through Cloudflare Workers is that since your LLM is located at the edge, latency is naturally minimized.
-
There’s a great free playground to experiment with Workers here.
This certainly makes it easy to go and build something, but it doesn’t solve the whole problem on its own. In order for the LLM to do its magic, it needs context… without it, all the suggestions will be hollow, empty, and impersonal. 😅
Exploring the Edge 🧗
This need for context-awareness leads us directly to the value of Segment, and the premise of Twilio’s CustomerAI: people respond better when you treat them personally. Segment will enable you to do just that by collecting everything you know about them in one profile that gets shared out to your whole stack.
A major technical limitation that we encounter is that personalizations need to be decisioned and delivered with sub-second latency. - you have to retrieve the profile before you pass it into your LLM and then you have to decide how to respond once the LLM has processed the inputs; your personalization effort may end before it begins because the long wait times cause the customer to abandon the request and go elsewhere!
This is where the Edge SDK helps you - by moving the profile to the edge and minimizing the latency of profile requests, you can use the profile more liberally to personalize without risking long waits.
In order to use the Edge SDK, spin up a Cloudflare Worker and import as follows: