Hosting Semantic Kernel in Orleans and Aspire

Function Calling elevates the usefulness of AI Chat Agents by giving them access to not only query your internal system data, but to make changes on behalf of a user. For instance, given a home automation system with an API, you could collect voice data, transform it to text input supplied to an agent which then calls functions to adjust lights in your house. This post covers the basics of such a system.

The distributed nature of Orleans provides an excellent runtime for interacting with Chat Agents. Even at small scales, representing the specific interaction with a particular agent as a Grain affords us the ability to compartmentalize all the associated data. One Grain type may have a code centric LLM supporting it to return JSON data while another may use a general purpose LLM to interact with the home automation functions. As we scale this out as part of an application, we are easily able to host per-user or per-customer-organization chat agents without changing our implementation, simply by leveraging Orleans.

So let's get started.

At the very basic level, our chat agent grain will need to accept a user message and return a response [from the agent]. We will also include the ability to return the full history. You may decide to have multiple grain interfaces, one for each type of agent it hosts, but for simplicity we'll stick with a single agent.

public interface IChatAgentGrain : IGrainWithStringKey
{
    public Task<string> Submit(string userMessage);

    public Task<ChatMessage[]> GetHistory();
}

To enable our LLM to call back into our (fake) home automation system, we'll need to develop a Plugin for Semantic Kernel. In the real world, this plugin would make an API call to a real home automation system, but for simplicity we'll just update some internal data.

public class LightsPlugin
{
    private readonly IEnumerable<LightModel> _lights;

    public LightsPlugin(IEnumerable<LightModel> lights)
    {
        _lights = lights;
    }

    [KernelFunction("get_lights")]
    [Description("Gets a list of lights and their current state")]
    public async Task<LightModel[]> GetLightsAsync()
    {
        return _lights.ToArray();
    }

    [KernelFunction("change_state")]
    [Description("Changes the state of the light")]
    public async Task<LightModel?> ChangeStateAsync(LightModel changeState)
    {
        // Find the light to change
        var light = _lights.FirstOrDefault(l => l.Id == changeState.Id);

        // If the light does not exist, return null
        if (light == null)
        {
            return null;
        }

        // Update the light state
        light.IsOn = changeState.IsOn;
        light.Brightness = changeState.Brightness;
        light.Color = changeState.Color;

        return light;
    }
}

Now let's turn our eyes toward developing the ChatAgentGrain, which executes in Orleans and combines state (chat history) with LLM calls.

The key to allowing the LLM to automatically call our Semantic Kernel Plugin is to pass in PromptExecutionSettings that define automatic Function Calling.

private readonly PromptExecutionSettings _executionSettings = new()
{
    FunctionChoiceBehavior = FunctionChoiceBehavior.Auto()
};

Our ChatAgentGrain will accept a persistent state parameter to store associated data such as our chat history, developer prompt, and chat history length. It also takes in an injected instance of the Kernel type, giving us access to the underlying IChatCompletionService and associated LLM.

public ChatAgentGrain(
    [PersistentState("history")]IPersistentState<AgentState> agentData,
    Kernel kernel)
{
    _agentData = agentData;
    _kernel = kernel;
    _chatCompletionService = kernel.GetRequiredService<IChatCompletionService>();
}

The implementation of the Submit method is a bit long-winded, but with good reason. It needs to add the new message and truncate the chat history, setup our Semantic Kernel with Developer Prompt and Chat History, call the Chat Completion Service, update the internal state and return the response.

public async Task<string> Submit(string userMessage)
{
    // Take the last N messages from our history,
    // add our new message and save it to the state
    _agentData.State.Messages = [
        .._agentData.State.Messages.TakeLast(_agentData.State.MaxChatHistoryLength - 1),
        new() {
            Type = ChatMessageType.User,
            Content = userMessage
        }
    ];

    await _agentData.WriteStateAsync();

    // Build the history for the kernel
    ChatHistory history = new ChatHistory();

    // add our developer prompt
    if (!string.IsNullOrWhiteSpace(_agentData.State.DeveloperPrompt))
    {
        history.AddDeveloperMessage(_agentData.State.DeveloperPrompt);
    }

    // add our history, this will include our new message
    foreach(var msg in _agentData.State.Messages)
    {
        history.AddMessage(
            msg.Type == ChatMessageType.System ? AuthorRole.System : AuthorRole.User,
            msg.Content
        );
    }

    // do AI things and get the response
    var response = await _chatCompletionService.GetChatMessageContentsAsync(
        history,
        kernel: _kernel,
        executionSettings: _executionSettings
    );

    var agentResponseContent = response[^1]?.Content ?? "Error - could not retrieve a response.";

    // write to the history
    _agentData.State.Messages = [
        .._agentData.State.Messages,
        new () { Content = agentResponseContent, Type = ChatMessageType.System }
    ];

    await _agentData.WriteStateAsync();

    return agentResponseContent;
}

In our Silo's Program.cs file, we'll configure the Semantic Kernel to use OpenAI Chat Completion and add support for our Plugin. We'll also configure Orleans, which is primarily configured via the Aspire AppHost.

builder.Services.AddOpenAIChatCompletion(
    modelId: "gpt-4o-mini",
    apiKey: builder.Configuration["OpenAI:ApiKey"] ?? "NO_KEY"
);

builder.Services.AddTransient((serviceProvider) =>
{
    var kernel = new Kernel(serviceProvider);
    kernel.Plugins.AddFromObject(new LightsPlugin([
        new LightModel { Id = 1, Name = "Table Lamp", IsOn = false, Brightness = Brightness.Medium, Color = "#FFFFFF" },
        new LightModel { Id = 2, Name = "Porch light", IsOn = false, Brightness = Brightness.High, Color = "#FF0000" },
        new LightModel { Id = 3, Name = "Chandelier", IsOn = true, Brightness = Brightness.Low, Color = "#FFFF00" }
    ]));

    return kernel;
});

builder.UseOrleans();

Now we add an API endpoint to support calling our Grain and Chat Agent. In the real world, we'd likely isolate grains based on user id or tenancy identifiers, ensuring that two different callers cannot access the same Chat Agent. For the purposes of this article, we'll take in an agentId property that we use to identify the agent.

// Submit a chat message and get a response
builder.MapPost("api/agents/{agentId}/chat",
    async(
        [FromServices] IClusterClient cluster,
        [FromRoute] string agentId,
        [FromBody] UserChatRequest request) =>
    {
        var agent = cluster.GetGrain<IChatAgentGrain>(agentId);
        var response = await agent.Submit(request.message);
        return new { message = response };
    }
).WithName("Agent_SubmitChat");

Our AppHost's Program.cs is simple - it configures our OpenAI API Key, Orleans, and passes the resources onto our Silo project.

var openAiApiKey = builder.AddParameter("OpenAiApiKey");

var orleans = builder.AddOrleans("orleans")
    .WithDevelopmentClustering()
    .WithMemoryGrainStorage("Default");

builder.AddProject<Projects.OrleansSamples_SemanticKernelAgents_Silo>("silo")
    .WithReference(orleans)
    .WithEnvironment("OpenAI__ApiKey", openAiApiKey);

To view the full source code for this sample, please visit github.com/jsedlak/orleans-samples/tree/main/ai/semantic-kernel.

My hope is that this sample provides a nice jumping off point for ideas that utilize the distributed execution and scaling of Orleans combined with the power of Semantic Kernel and Function Calling.

As a real world example, I've been incorporating Semantic Kernel as Grains in Reach to support a chat agent that is capable of reporting, importing and exporting content in a number of ways, including natural language. By providing the agent with plugins that can create component definitions and content, my hope is that content authors are able to turn ideas into reality more efficiently. You can view the source code for this integration here.