SDK Reference

Core Module

class AlephAlphaChatModel(name: str, client: AlephAlphaClientProtocol | None = None)[source]

Bases: ChatModel, ControlModel

Abstract base class for any model that supports chat and runs via the Aleph Alpha API.

complete(input: CompleteInput, tracer: Tracer) CompleteOutput
complete_task() Task[CompleteInput, CompleteOutput]
echo(prompt: str, expected_completion: str, tracer: Tracer) Sequence[tuple[Token, float | None]]

Echos the log probs for each token of an expected completion given a prompt.

Parameters:
  • prompt – The prompt to echo

  • expected_completion – The expected completion to get log probs for

  • tracer – Valid instance of a tracer

Returns:

A list of tuples with token identifier and log probability

echo_chat(messages: list[Message], response_prefix: str | None, expected_completion: str, tracer: Tracer) Sequence[tuple[Any, float | None]][source]

Echos the log probs for each token of an expected completion given a prompt.

Parameters:
  • messages – The messages to be used as prompt

  • response_prefix – Append the given string to the beginning of the final agent message to steer the generation.

  • expected_completion – The expected completion to get log probs for

  • tracer – Valid instance of a tracer

Returns:

A list of tuples with token identifier and log probability

explain(input: ExplainInput, tracer: Tracer) ExplainOutput
generate(prompt: str, tracer: Tracer) str

A completion function that takes a prompt and generates a completion.

Parameters:
  • prompt – The prompt to generate a completion for

  • tracer – Valid instance of a tracer

Returns:

An LLM completion

generate_chat(messages: Sequence[Message], response_prefix: str | None, tracer: Tracer) str[source]

Generate a raw completion to messages for any AlephAlphaChatModel.

Parameters:
  • messages – A number of messages to use as prompt for the model

  • response_prefix – Optional argument to append a string to the beginning of the final agent message to steer the generation

  • tracer – Valid instance of a tracer

Returns:

An LLM completion

get_tokenizer() Tokenizer
get_tokenizer_no_whitespace_prefix() Tokenizer
to_chat_prompt(messages: Sequence[Message], response_prefix: str | None = None) RichPrompt[source]

Method to create a chat-RichPrompt object to use with any AlephAlphaModel.

Parameters:
  • messages – A number of messages to use as prompt for the model

  • response_prefix – Append the given string to the beginning of the final agent message to steer the generation. Defaults to None.

Returns:

A RichPrompt object to be consumed by the Aleph Alpha client

abstract to_finetuning_sample(messages: Sequence[Message]) Sequence[FinetuningMessage][source]

Abstract function allowing a user to define what the model’s finetuning samples should look like.

Parameters:

messages – The messages making up the finetuning sample

Returns:

A finetuning sample containing the input messages

to_instruct_prompt(instruction: str, input: str | None = None, response_prefix: str | None = None, instruction_controls: Sequence[TextControl] | None = None, input_controls: Sequence[TextControl] | None = None) RichPrompt[source]

Method to use a chat model like an instruct model`.

Parameters:
  • instruction – The task the model should fulfill, for example summarization

  • input – Any context necessary to solve the task, such as the text to be summarized

  • response_prefix – Optional argument to append a string to the beginning of the final agent message to steer the generation

  • instruction_controls – Instruction controls are not used but needed for the interface.

  • input_controls – Input controls are not used but needed for the interface

Returns:

The rendered prompt with all variables filled in.

tokenize(text: str, whitespace_prefix: bool = True) Encoding
class AlephAlphaModel(name: str, client: AlephAlphaClientProtocol | None = None)[source]

Bases: LanguageModel

Model-class for any model that uses the Aleph Alpha client.

Any class of Aleph Alpha model is implemented on top of this base class. Exposes methods that are available to all models, such as complete and tokenize. It is the central place for all things that are physically interconnected with a model, such as its tokenizer or prompt format used during training.

Parameters:
  • name – The name of a valid model that can access an API using an implementation of the AlephAlphaClientProtocol.

  • client – Aleph Alpha client instance for running model related API calls. Defaults to LimitedConcurrencyClient

complete(input: CompleteInput, tracer: Tracer) CompleteOutput[source]
complete_task() Task[CompleteInput, CompleteOutput][source]
echo(prompt: str, expected_completion: str, tracer: Tracer) Sequence[tuple[Token, float | None]][source]

Echos the log probs for each token of an expected completion given a prompt.

Parameters:
  • prompt – The prompt to echo

  • expected_completion – The expected completion to get log probs for

  • tracer – Valid instance of a tracer

Returns:

A list of tuples with token identifier and log probability

explain(input: ExplainInput, tracer: Tracer) ExplainOutput[source]
generate(prompt: str, tracer: Tracer) str[source]

A completion function that takes a prompt and generates a completion.

Parameters:
  • prompt – The prompt to generate a completion for

  • tracer – Valid instance of a tracer

Returns:

An LLM completion

get_tokenizer() Tokenizer[source]
get_tokenizer_no_whitespace_prefix() Tokenizer[source]
tokenize(text: str, whitespace_prefix: bool = True) Encoding[source]
class ChatModel(name: str)[source]

Bases: LanguageModel

Abstract base class to implement any model that supports chat.

abstract echo(prompt: str, expected_completion: str, tracer: Tracer) Sequence[tuple[Any, float | None]]

Echos the log probs for each token of an expected completion given a prompt.

Parameters:
  • prompt – The prompt to echo

  • expected_completion – The expected completion to get log probs for

  • tracer – Valid instance of a tracer

Returns:

A list of tuples with token identifier and log probability

abstract echo_chat(messages: list[Message], response_prefix: str | None, expected_completion: str, tracer: Tracer) Sequence[tuple[Any, float | None]][source]

Echos the log probs for each token of an expected completion given a prompt.

Parameters:
  • messages – The messages to be used as prompt

  • response_prefix – Append the given string to the beginning of the final agent message to steer the generation.

  • expected_completion – The expected completion to get log probs for

  • tracer – Valid instance of a tracer

Returns:

A list of tuples with token identifier and log probability

abstract generate(prompt: str, tracer: Tracer) str

A completion function that takes a prompt and generates a completion.

Parameters:
  • prompt – The prompt to generate a completion for

  • tracer – Valid instance of a tracer

Returns:

An LLM completion

abstract generate_chat(messages: list[Message], response_prefix: str | None, tracer: Tracer) str[source]

A completion function that takes a prompt and generates a completion.

Parameters:
  • messages – The messages to be used as prompt

  • response_prefix – Append the given string to the beginning of the final agent message to steer the generation.

  • tracer – Valid instance of a tracer

Returns:

An LLM completion

class Chunk(model: AlephAlphaModel, max_tokens_per_chunk: int = 512)[source]

Bases: Task[ChunkInput, ChunkOutput]

Splits a longer text into smaller text chunks.

Provide a text of any length and chunk it into smaller pieces using a tokenizer that is available within the Aleph Alpha client.

Parameters:
  • model – A valid Aleph Alpha model.

  • max_tokens_per_chunk – The maximum number of tokens to fit into one chunk.

do_run(input: ChunkInput, task_span: TaskSpan) ChunkOutput[source]

The implementation for this use case.

This takes an input and runs the implementation to generate an output. It takes a Span for tracing of the process. The Input and Output are logged by default.

Parameters:
  • input – Generic input defined by the task implementation

  • task_span – The Span used for tracing.

Returns:

Generic output defined by the task implementation.

run(input: Input, tracer: Tracer) Output

Executes the implementation of do_run for this use case.

This takes an input and runs the implementation to generate an output. It takes a Tracer for tracing of the process. The Input and Output are logged by default.

Parameters:
  • input – Generic input defined by the task implementation

  • tracer – The Tracer used for tracing.

Returns:

Generic output defined by the task implementation.

run_concurrently(inputs: Iterable[Input], tracer: Tracer, concurrency_limit: int = 20) Sequence[Output]

Executes multiple processes of this task concurrently.

Each provided input is potentially executed concurrently to the others. There is a global limit on the number of concurrently executed tasks that is shared by all tasks of all types.

Parameters:
  • inputs – The inputs that are potentially processed concurrently.

  • tracer – The tracer passed on the run method when executing a task.

  • concurrency_limit – An optional additional limit for the number of concurrently executed task for this method call. This can be used to prevent queue-full or similar error of downstream APIs when the global concurrency limit is too high for a certain task.

Returns:

The Outputs generated by calling run for each given Input. The order of Outputs corresponds to the order of the Inputs.

class ChunkInput(*, text: str)[source]

Bases: BaseModel

The input for a Chunk-task.

text

A text of arbitrary length.

Type:

str

class ChunkOutput(*, chunks: Sequence[TextChunk])[source]

Bases: BaseModel

The output of a ChunkTask.

chunks

A list of smaller sections of the input text.

Type:

collections.abc.Sequence[pharia_inference_sdk.core.chunk.TextChunk]

class ChunkWithIndices(model: AlephAlphaModel, max_tokens_per_chunk: int = 512)[source]

Bases: Task[ChunkInput, ChunkWithIndicesOutput]

Splits a longer text into smaller text chunks and returns the chunks’ start indices.

Provide a text of any length and chunk it into smaller pieces using a tokenizer that is available within the Aleph Alpha client. For each chunk, the respective start index relative to the document is also returned.

Parameters:
  • model – A valid Aleph Alpha model.

  • max_tokens_per_chunk – The maximum number of tokens to fit into one chunk.

do_run(input: ChunkInput, task_span: TaskSpan) ChunkWithIndicesOutput[source]

The implementation for this use case.

This takes an input and runs the implementation to generate an output. It takes a Span for tracing of the process. The Input and Output are logged by default.

Parameters:
  • input – Generic input defined by the task implementation

  • task_span – The Span used for tracing.

Returns:

Generic output defined by the task implementation.

run(input: Input, tracer: Tracer) Output

Executes the implementation of do_run for this use case.

This takes an input and runs the implementation to generate an output. It takes a Tracer for tracing of the process. The Input and Output are logged by default.

Parameters:
  • input – Generic input defined by the task implementation

  • tracer – The Tracer used for tracing.

Returns:

Generic output defined by the task implementation.

run_concurrently(inputs: Iterable[Input], tracer: Tracer, concurrency_limit: int = 20) Sequence[Output]

Executes multiple processes of this task concurrently.

Each provided input is potentially executed concurrently to the others. There is a global limit on the number of concurrently executed tasks that is shared by all tasks of all types.

Parameters:
  • inputs – The inputs that are potentially processed concurrently.

  • tracer – The tracer passed on the run method when executing a task.

  • concurrency_limit – An optional additional limit for the number of concurrently executed task for this method call. This can be used to prevent queue-full or similar error of downstream APIs when the global concurrency limit is too high for a certain task.

Returns:

The Outputs generated by calling run for each given Input. The order of Outputs corresponds to the order of the Inputs.

class ChunkWithIndicesOutput(*, chunks_with_indices: Sequence[ChunkWithStartEndIndices])[source]

Bases: BaseModel

The output of a ChunkWithIndices-task.

chunks_with_indices

A list of smaller sections of the input text with the respective start_index.

Type:

collections.abc.Sequence[pharia_inference_sdk.core.chunk.ChunkWithStartEndIndices]

class ChunkWithStartEndIndices(*, chunk: TextChunk, start_index: int, end_index: int)[source]

Bases: BaseModel

A TextChunk and its start_index and end_index within the given text.

chunk

The actual text.

Type:

pharia_inference_sdk.core.chunk.TextChunk

start_index

The character start index of the chunk within the given text.

Type:

int

end_index

The character end index of the chunk within the given text.

Type:

int

class CompleteInput(*, prompt: Prompt, maximum_tokens: int | None = None, temperature: float = 0.0, top_k: int = 0, top_p: float = 0.0, presence_penalty: float = 0.0, frequency_penalty: float = 0.0, repetition_penalties_include_prompt: bool = False, use_multiplicative_presence_penalty: bool = False, penalty_bias: str | None = None, penalty_exceptions: List[str] | None = None, penalty_exceptions_include_stop_sequences: bool | None = None, best_of: int | None = None, n: int = 1, logit_bias: Dict[int, float] | None = None, log_probs: int | None = None, stop_sequences: List[str] | None = None, tokens: bool = False, disable_optimizations: bool = False, minimum_tokens: int = 0, echo: bool = False, use_multiplicative_frequency_penalty: bool = False, sequence_penalty: float = 0.0, sequence_penalty_min_length: int = 2, use_multiplicative_sequence_penalty: bool = False, completion_bias_inclusion: Sequence[str] | None = None, completion_bias_inclusion_first_token_only: bool = False, completion_bias_exclusion: Sequence[str] | None = None, completion_bias_exclusion_first_token_only: bool = False, contextual_control_threshold: float | None = None, control_log_additive: bool | None = True, repetition_penalties_include_completion: bool = True, raw_completion: bool = False, steering_concepts: List[str] | None = None)[source]

Bases: BaseModel, CompletionRequest

The input for a Complete task.

to_completion_request() CompletionRequest[source]
class CompleteOutput(*, model_version: str, completions: Sequence[CompletionResult], num_tokens_prompt_total: int, num_tokens_generated: int, optimized_prompt: Prompt | None = None)[source]

Bases: BaseModel, CompletionResponse

The output of a Complete task.

static from_completion_response(completion_response: CompletionResponse) CompleteOutput[source]
class CompositeTracer(tracers: Sequence[TracerVar])[source]

Bases: Tracer, Generic[TracerVar]

A Tracer that allows for recording to multiple tracers simultaneously.

Each log-entry and span will be forwarded to all subtracers.

Parameters:

tracers – tracers that will be forwarded all subsequent log and span calls.

Example

>>> from core import InMemoryTracer, FileTracer, CompositeTracer, TextChunk
>>> from examples import PromptBasedClassify, ClassifyInput
>>> tracer_1 = InMemoryTracer()
>>> tracer_2 = InMemoryTracer()
>>> tracer = CompositeTracer([tracer_1, tracer_2])
>>> task = PromptBasedClassify()
>>> response = task.run(ClassifyInput(chunk=TextChunk("Cool"), labels=frozenset({"label", "other label"})), tracer)
export_for_viewing() Sequence[ExportedSpan][source]

Converts the trace to a format that can be read by Pharia Studio.

The format is inspired by the OpenTelemetry Format, but does not abide by it. Specifically, it cuts away unused concepts, such as links.

Returns:

A list of spans which includes the current span and all its child spans.

span(name: str, timestamp: datetime | None = None) CompositeSpan[Span][source]

Generate a span from the current span or logging instance.

Allows for grouping multiple logs and duration together as a single, logical step in the process.

Each tracer implementation can decide on how it wants to represent this, but they should all capture the hierarchical nature of nested spans, as well as the idea of the duration of the span.

Parameters:
  • name – A descriptive name of what this span will contain logs about.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a Span.

task_span(task_name: str, input: PydanticSerializable, timestamp: datetime | None = None) CompositeTaskSpan[source]

Generate a task-specific span from the current span or logging instance.

Allows for grouping multiple logs together, as well as the task’s specific input, output, and duration.

Each tracer implementation can decide on how it wants to represent this, but they should all allow for representing logs of a span within the context of a parent span.

Parameters:
  • task_name – The name of the task that is being logged

  • input – The input for the task that is being logged.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a TaskSpan.

class Context(*, trace_id: UUID, span_id: UUID)[source]

Bases: BaseModel

class ControlModel(name: str, client: AlephAlphaClientProtocol | None = None)[source]

Bases: AlephAlphaModel, ABC

complete(input: CompleteInput, tracer: Tracer) CompleteOutput
complete_task() Task[CompleteInput, CompleteOutput]
echo(prompt: str, expected_completion: str, tracer: Tracer) Sequence[tuple[Token, float | None]]

Echos the log probs for each token of an expected completion given a prompt.

Parameters:
  • prompt – The prompt to echo

  • expected_completion – The expected completion to get log probs for

  • tracer – Valid instance of a tracer

Returns:

A list of tuples with token identifier and log probability

explain(input: ExplainInput, tracer: Tracer) ExplainOutput
generate(prompt: str, tracer: Tracer) str

A completion function that takes a prompt and generates a completion.

Parameters:
  • prompt – The prompt to generate a completion for

  • tracer – Valid instance of a tracer

Returns:

An LLM completion

get_tokenizer() Tokenizer
get_tokenizer_no_whitespace_prefix() Tokenizer
to_instruct_prompt(instruction: str, input: str | None = None, response_prefix: str | None = None, instruction_controls: Sequence[TextControl] | None = None, input_controls: Sequence[TextControl] | None = None) RichPrompt[source]

Method to create an instruct-RichPrompt object to use with any ControlModel.

Parameters:
  • instruction – The task the model should fulfill, for example summarization

  • input – Any context necessary to solve the task, such as the text to be summarize

  • response_prefix – Optional argument to append a string to the beginning of the final agent message to steer the generation

  • instruction_controls – TextControls for the instruction part of the prompt. Only for text prompts.

  • input_controls – TextControls for the input part of the prompt. Only for text prompts.

Returns:

The rendered prompt with all variables filled in.

tokenize(text: str, whitespace_prefix: bool = True) Encoding
class DetectLanguage(threshold: float = 0.5)[source]

Bases: Task[DetectLanguageInput, DetectLanguageOutput]

Task that detects the language of a text.

Analyzes the likelihood that a given text is written in one of the possible_languages. Returns the best match or None.

Parameters:

threshold – Minimum probability value for a language to be considered the best_fit.

Example

>>> from core import (
...     DetectLanguage,
...     DetectLanguageInput,
...     InMemoryTracer,
...     Language,
... )
>>> task = DetectLanguage()
>>> input = DetectLanguageInput(
...     text="This is an English text.",
...     possible_languages=[Language(l) for l in ("en", "fr")],
... )
>>> output = task.run(input, InMemoryTracer())
do_run(input: DetectLanguageInput, task_span: TaskSpan) DetectLanguageOutput[source]

The implementation for this use case.

This takes an input and runs the implementation to generate an output. It takes a Span for tracing of the process. The Input and Output are logged by default.

Parameters:
  • input – Generic input defined by the task implementation

  • task_span – The Span used for tracing.

Returns:

Generic output defined by the task implementation.

run(input: Input, tracer: Tracer) Output

Executes the implementation of do_run for this use case.

This takes an input and runs the implementation to generate an output. It takes a Tracer for tracing of the process. The Input and Output are logged by default.

Parameters:
  • input – Generic input defined by the task implementation

  • tracer – The Tracer used for tracing.

Returns:

Generic output defined by the task implementation.

run_concurrently(inputs: Iterable[Input], tracer: Tracer, concurrency_limit: int = 20) Sequence[Output]

Executes multiple processes of this task concurrently.

Each provided input is potentially executed concurrently to the others. There is a global limit on the number of concurrently executed tasks that is shared by all tasks of all types.

Parameters:
  • inputs – The inputs that are potentially processed concurrently.

  • tracer – The tracer passed on the run method when executing a task.

  • concurrency_limit – An optional additional limit for the number of concurrently executed task for this method call. This can be used to prevent queue-full or similar error of downstream APIs when the global concurrency limit is too high for a certain task.

Returns:

The Outputs generated by calling run for each given Input. The order of Outputs corresponds to the order of the Inputs.

class DetectLanguageInput(*, text: str, possible_languages: Sequence[Language])[source]

Bases: BaseModel

The input for a DetectLanguage task.

text

The text to identify the language for.

Type:

str

possible_languages

All languages that should be considered during detection. Languages should be provided with their ISO 639-1 codes.

Type:

collections.abc.Sequence[pharia_inference_sdk.core.detect_language.Language]

class DetectLanguageOutput(*, best_fit: Language | None)[source]

Bases: BaseModel

The output of a DetectLanguage task.

best_fit

The prediction for the best matching language. Will be None if no language has a probability above the threshold.

Type:

pharia_inference_sdk.core.detect_language.Language | None

class Echo(model: AlephAlphaModel)[source]

Bases: Task[EchoInput, EchoOutput]

Task that returns probabilities of a completion given a prompt.

Analyzes the likelihood of generating tokens in the expected completion based on a given prompt and model. Does not generate any tokens.

Parameters:

model – A model to use in the task.

Example

>>> from aleph_alpha_client import Prompt
>>> from core import Echo, EchoInput, InMemoryTracer, LuminousControlModel
>>> model = LuminousControlModel(name="luminous-base-control")
>>> task = Echo(model)
>>> input = EchoInput(
...     prompt=Prompt.from_text("This is a "),
...     expected_completion="happy text",
... )
>>> tracer = InMemoryTracer()
>>> output = task.run(input, tracer)
do_run(input: EchoInput, task_span: TaskSpan) EchoOutput[source]

The implementation for this use case.

This takes an input and runs the implementation to generate an output. It takes a Span for tracing of the process. The Input and Output are logged by default.

Parameters:
  • input – Generic input defined by the task implementation

  • task_span – The Span used for tracing.

Returns:

Generic output defined by the task implementation.

run(input: Input, tracer: Tracer) Output

Executes the implementation of do_run for this use case.

This takes an input and runs the implementation to generate an output. It takes a Tracer for tracing of the process. The Input and Output are logged by default.

Parameters:
  • input – Generic input defined by the task implementation

  • tracer – The Tracer used for tracing.

Returns:

Generic output defined by the task implementation.

run_concurrently(inputs: Iterable[Input], tracer: Tracer, concurrency_limit: int = 20) Sequence[Output]

Executes multiple processes of this task concurrently.

Each provided input is potentially executed concurrently to the others. There is a global limit on the number of concurrently executed tasks that is shared by all tasks of all types.

Parameters:
  • inputs – The inputs that are potentially processed concurrently.

  • tracer – The tracer passed on the run method when executing a task.

  • concurrency_limit – An optional additional limit for the number of concurrently executed task for this method call. This can be used to prevent queue-full or similar error of downstream APIs when the global concurrency limit is too high for a certain task.

Returns:

The Outputs generated by calling run for each given Input. The order of Outputs corresponds to the order of the Inputs.

class EchoInput(*, prompt: Prompt, expected_completion: str)[source]

Bases: BaseModel

The input for an Echo task.

prompt

The input text that serves as the starting point for the LLM.

Type:

aleph_alpha_client.prompt.Prompt

expected_completion

The desired completion based on the prompt. The likelihood of the tokens in this will be examined.

Type:

str

class EchoOutput(*, tokens_with_log_probs: Sequence[TokenWithLogProb])[source]

Bases: BaseModel

The output of an Echo task.

tokens_with_log_probs

Every token of the expected_completion of the EchoInput accompanied by its probability of having been generated in a completion scenario.

Type:

collections.abc.Sequence[pharia_inference_sdk.core.echo.TokenWithLogProb]

class EndSpan(*, uuid: UUID, end: datetime, status_code: SpanStatus = SpanStatus.OK)[source]

Bases: BaseModel

Represents the payload/entry of a log-line that indicates that a Span ended.

uuid

The uuid of the corresponding StartSpan.

Type:

uuid.UUID

end

the timestamp when this Span completed.

Type:

datetime.datetime

class EndTask(*, uuid: UUID, end: datetime, output: Annotated[PydanticSerializable, SerializeAsAny()], status_code: SpanStatus = SpanStatus.OK)[source]

Bases: BaseModel

Represents the payload/entry of a log-line that indicates that a TaskSpan ended (i.e. the context-manager exited).

uuid

The uuid of the corresponding StartTask.

Type:

uuid.UUID

end

the timestamp when this Task completed (i.e. run returned).

Type:

datetime.datetime

output

the Output (i.e. return value of run) the Task returned.

Type:

PydanticSerializable

class ErrorValue(*, error_type: str, message: str, stack_trace: str)[source]

Bases: BaseModel

class Event(*, name: str, message: str, body: ~pharia_inference_sdk.core.tracer.tracer.Annotated[PydanticSerializable, ~pydantic.functional_serializers.SerializeAsAny()], timestamp: ~datetime.datetime = <factory>)[source]

Bases: BaseModel

class ExplainInput(*, prompt: Prompt, target: str, contextual_control_threshold: float | None = None, control_factor: float | None = None, control_token_overlap: ControlTokenOverlap | None = None, control_log_additive: bool | None = None, prompt_granularity: PromptGranularity | str | CustomGranularity | None = None, target_granularity: TargetGranularity | None = None, postprocessing: ExplanationPostprocessing | None = None, normalize: bool | None = None)[source]

Bases: BaseModel, ExplanationRequest

The input for a Explain task.

to_explanation_request() ExplanationRequest[source]
class ExplainOutput(*, model_version: str, explanations: List[Explanation])[source]

Bases: BaseModel, ExplanationResponse

The output of a Explain task.

static from_explanation_response(explanation_response: ExplanationResponse) ExplainOutput[source]
class ExportedSpan(*, context: Context, name: str | None, parent_id: UUID | None, start_time: datetime, end_time: datetime, attributes: SpanAttributes | TaskSpanAttributes, events: Sequence[Event], status: SpanStatus)[source]

Bases: BaseModel

ExportedSpanList

alias of RootModel[Sequence[ExportedSpan]]

class FileSpan(log_file_path: Path, context: Context | None = None)[source]

Bases: PersistentSpan, FileTracer

A Span created by FileTracer.span.

convert_file_for_viewing(file_path: Path | str) None
end(timestamp: datetime | None = None) None

Marks the Span as closed and sets the end time.

The Span should be regarded as complete, and no further logging should happen with it.

Ending a closed span in undefined behavior.

Parameters:

timestamp – Optional override of the timestamp. Defaults to call time.

export_for_viewing() Sequence[ExportedSpan]

Converts the trace to a format that can be read by Pharia Studio.

The format is inspired by the OpenTelemetry Format, but does not abide by it. Specifically, it cuts away unused concepts, such as links.

Returns:

A list of spans which includes the current span and all its child spans.

log(message: str, value: PydanticSerializable, timestamp: datetime | None = None) None

Record a log of relevant information as part of a step within a task.

By default, the Input and Output of each Task are logged automatically, but you can log anything else that seems relevant to understanding the process of a given task.

Logging to closed spans is undefined behavior.

Parameters:
  • message – A description of the value you are logging, such as the step in the task this is related to.

  • value – The relevant data you want to log. Can be anything that is serializable by Pydantic, which gives the tracers flexibility in how they store and emit the logs.

  • timestamp – optional override of the timestamp. Otherwise should default to now

span(name: str, timestamp: datetime | None = None) FileSpan

Generate a span from the current span or logging instance.

Allows for grouping multiple logs and duration together as a single, logical step in the process.

Each tracer implementation can decide on how it wants to represent this, but they should all capture the hierarchical nature of nested spans, as well as the idea of the duration of the span.

Parameters:
  • name – A descriptive name of what this span will contain logs about.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a Span.

task_span(task_name: str, input: PydanticSerializable, timestamp: datetime | None = None) FileTaskSpan

Generate a task-specific span from the current span or logging instance.

Allows for grouping multiple logs together, as well as the task’s specific input, output, and duration.

Each tracer implementation can decide on how it wants to represent this, but they should all allow for representing logs of a span within the context of a parent span.

Parameters:
  • task_name – The name of the task that is being logged

  • input – The input for the task that is being logged.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a TaskSpan.

traces(trace_id: str | None = None) InMemoryTracer

Returns all traces of the given tracer.

Returns:

An InMemoryTracer that contains all traces of the tracer.

class FileTaskSpan(log_file_path: Path, context: Context | None = None)[source]

Bases: PersistentTaskSpan, FileSpan

A TaskSpan created by FileTracer.task_span.

convert_file_for_viewing(file_path: Path | str) None
end(timestamp: datetime | None = None) None

Marks the Span as closed and sets the end time.

The Span should be regarded as complete, and no further logging should happen with it.

Ending a closed span in undefined behavior.

Parameters:

timestamp – Optional override of the timestamp. Defaults to call time.

export_for_viewing() Sequence[ExportedSpan]

Converts the trace to a format that can be read by Pharia Studio.

The format is inspired by the OpenTelemetry Format, but does not abide by it. Specifically, it cuts away unused concepts, such as links.

Returns:

A list of spans which includes the current span and all its child spans.

log(message: str, value: PydanticSerializable, timestamp: datetime | None = None) None

Record a log of relevant information as part of a step within a task.

By default, the Input and Output of each Task are logged automatically, but you can log anything else that seems relevant to understanding the process of a given task.

Logging to closed spans is undefined behavior.

Parameters:
  • message – A description of the value you are logging, such as the step in the task this is related to.

  • value – The relevant data you want to log. Can be anything that is serializable by Pydantic, which gives the tracers flexibility in how they store and emit the logs.

  • timestamp – optional override of the timestamp. Otherwise should default to now

record_output(output: PydanticSerializable) None

Record Task output.

Since a Context Manager can’t capture output in the __exit__ method, output should be captured once it is generated.

Parameters:

output – The output of the task that is being logged.

span(name: str, timestamp: datetime | None = None) FileSpan

Generate a span from the current span or logging instance.

Allows for grouping multiple logs and duration together as a single, logical step in the process.

Each tracer implementation can decide on how it wants to represent this, but they should all capture the hierarchical nature of nested spans, as well as the idea of the duration of the span.

Parameters:
  • name – A descriptive name of what this span will contain logs about.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a Span.

task_span(task_name: str, input: PydanticSerializable, timestamp: datetime | None = None) FileTaskSpan

Generate a task-specific span from the current span or logging instance.

Allows for grouping multiple logs together, as well as the task’s specific input, output, and duration.

Each tracer implementation can decide on how it wants to represent this, but they should all allow for representing logs of a span within the context of a parent span.

Parameters:
  • task_name – The name of the task that is being logged

  • input – The input for the task that is being logged.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a TaskSpan.

traces(trace_id: str | None = None) InMemoryTracer

Returns all traces of the given tracer.

Returns:

An InMemoryTracer that contains all traces of the tracer.

class FileTracer(log_file_path: Path | str)[source]

Bases: PersistentTracer

A Tracer that logs to a file.

Each log-entry is represented by a JSON object. The information logged allows to reconstruct the hierarchical nature of the logs, i.e. all entries have a _pointer_ to its parent element in form of a parent attribute containing the uuid of the parent.

Parameters:

log_file_path – Denotes the file to log to.

uuid

a uuid for the tracer. If multiple FileTracer instances log to the same file the child-elements for a tracer can be identified by referring to this id as parent.

convert_file_for_viewing(file_path: Path | str) None[source]
export_for_viewing() Sequence[ExportedSpan]

Converts the trace to a format that can be read by Pharia Studio.

The format is inspired by the OpenTelemetry Format, but does not abide by it. Specifically, it cuts away unused concepts, such as links.

Returns:

A list of spans which includes the current span and all its child spans.

span(name: str, timestamp: datetime | None = None) FileSpan[source]

Generate a span from the current span or logging instance.

Allows for grouping multiple logs and duration together as a single, logical step in the process.

Each tracer implementation can decide on how it wants to represent this, but they should all capture the hierarchical nature of nested spans, as well as the idea of the duration of the span.

Parameters:
  • name – A descriptive name of what this span will contain logs about.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a Span.

task_span(task_name: str, input: PydanticSerializable, timestamp: datetime | None = None) FileTaskSpan[source]

Generate a task-specific span from the current span or logging instance.

Allows for grouping multiple logs together, as well as the task’s specific input, output, and duration.

Each tracer implementation can decide on how it wants to represent this, but they should all allow for representing logs of a span within the context of a parent span.

Parameters:
  • task_name – The name of the task that is being logged

  • input – The input for the task that is being logged.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a TaskSpan.

traces(trace_id: str | None = None) InMemoryTracer[source]

Returns all traces of the given tracer.

Returns:

An InMemoryTracer that contains all traces of the tracer.

class FinetuningMessage(*, has_loss: bool, content: str, type: str = 'text')[source]

Bases: BaseModel

Represent a prompt message in a finetuning sample as required to finetune an llm using [scaling](https://github.com/Aleph-Alpha/scaling).

Parameters:
  • has_loss – Flag indicated whether loss should be applied to the message during training.

  • content – The text in the message

  • type – Should always be “text”

class InMemorySpan(name: str, context: Context | None = None, start_timestamp: datetime | None = None)[source]

Bases: InMemoryTracer, Span

A span that keeps all important information in memory.

context

Ids that uniquely describe the span.

Type:

pharia_inference_sdk.core.tracer.tracer.Context

parent_id

Id of the parent span. None if the span is a root span.

name

The name of the span.

start_timestamp

The start of the timestamp.

end_timestamp

The end of the timestamp. None until the span is closed.

status_code

The status of the context.

end(timestamp: datetime | None = None) None[source]

Marks the Span as closed and sets the end time.

The Span should be regarded as complete, and no further logging should happen with it.

Ending a closed span in undefined behavior.

Parameters:

timestamp – Optional override of the timestamp. Defaults to call time.

export_for_viewing() Sequence[ExportedSpan][source]

Converts the trace to a format that can be read by Pharia Studio.

The format is inspired by the OpenTelemetry Format, but does not abide by it. Specifically, it cuts away unused concepts, such as links.

Returns:

A list of spans which includes the current span and all its child spans.

log(message: str, value: PydanticSerializable, timestamp: datetime | None = None) None[source]

Record a log of relevant information as part of a step within a task.

By default, the Input and Output of each Task are logged automatically, but you can log anything else that seems relevant to understanding the process of a given task.

Logging to closed spans is undefined behavior.

Parameters:
  • message – A description of the value you are logging, such as the step in the task this is related to.

  • value – The relevant data you want to log. Can be anything that is serializable by Pydantic, which gives the tracers flexibility in how they store and emit the logs.

  • timestamp – optional override of the timestamp. Otherwise should default to now

span(name: str, timestamp: datetime | None = None) InMemorySpan

Generate a span from the current span or logging instance.

Allows for grouping multiple logs and duration together as a single, logical step in the process.

Each tracer implementation can decide on how it wants to represent this, but they should all capture the hierarchical nature of nested spans, as well as the idea of the duration of the span.

Parameters:
  • name – A descriptive name of what this span will contain logs about.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a Span.

task_span(task_name: str, input: PydanticSerializable, timestamp: datetime | None = None) InMemoryTaskSpan

Generate a task-specific span from the current span or logging instance.

Allows for grouping multiple logs together, as well as the task’s specific input, output, and duration.

Each tracer implementation can decide on how it wants to represent this, but they should all allow for representing logs of a span within the context of a parent span.

Parameters:
  • task_name – The name of the task that is being logged

  • input – The input for the task that is being logged.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a TaskSpan.

class InMemoryTaskSpan(name: str, input: Annotated[PydanticSerializable, SerializeAsAny()], context: Context | None = None, start_timestamp: datetime | None = None)[source]

Bases: InMemorySpan, TaskSpan

A span of a task that keeps all important information in memory.

context

Ids that uniquely describe the span.

Type:

pharia_inference_sdk.core.tracer.tracer.Context

parent_id

Id of the parent span. None if the span is a root span.

name

The name of the span.

start_timestamp

The start of the timestamp.

end_timestamp

The end of the timestamp. None until the span is closed.

status_code

The status of the context.

input

The input of the task.

output

The output of the task.

end(timestamp: datetime | None = None) None

Marks the Span as closed and sets the end time.

The Span should be regarded as complete, and no further logging should happen with it.

Ending a closed span in undefined behavior.

Parameters:

timestamp – Optional override of the timestamp. Defaults to call time.

export_for_viewing() Sequence[ExportedSpan]

Converts the trace to a format that can be read by Pharia Studio.

The format is inspired by the OpenTelemetry Format, but does not abide by it. Specifically, it cuts away unused concepts, such as links.

Returns:

A list of spans which includes the current span and all its child spans.

log(message: str, value: PydanticSerializable, timestamp: datetime | None = None) None

Record a log of relevant information as part of a step within a task.

By default, the Input and Output of each Task are logged automatically, but you can log anything else that seems relevant to understanding the process of a given task.

Logging to closed spans is undefined behavior.

Parameters:
  • message – A description of the value you are logging, such as the step in the task this is related to.

  • value – The relevant data you want to log. Can be anything that is serializable by Pydantic, which gives the tracers flexibility in how they store and emit the logs.

  • timestamp – optional override of the timestamp. Otherwise should default to now

record_output(output: PydanticSerializable) None[source]

Record Task output.

Since a Context Manager can’t capture output in the __exit__ method, output should be captured once it is generated.

Parameters:

output – The output of the task that is being logged.

span(name: str, timestamp: datetime | None = None) InMemorySpan

Generate a span from the current span or logging instance.

Allows for grouping multiple logs and duration together as a single, logical step in the process.

Each tracer implementation can decide on how it wants to represent this, but they should all capture the hierarchical nature of nested spans, as well as the idea of the duration of the span.

Parameters:
  • name – A descriptive name of what this span will contain logs about.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a Span.

task_span(task_name: str, input: PydanticSerializable, timestamp: datetime | None = None) InMemoryTaskSpan

Generate a task-specific span from the current span or logging instance.

Allows for grouping multiple logs together, as well as the task’s specific input, output, and duration.

Each tracer implementation can decide on how it wants to represent this, but they should all allow for representing logs of a span within the context of a parent span.

Parameters:
  • task_name – The name of the task that is being logged

  • input – The input for the task that is being logged.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a TaskSpan.

class InMemoryTracer[source]

Bases: Tracer

Collects log entries in a nested structure, and keeps them in memory.

entries

A sequential list of log entries and/or nested InMemoryTracers with their own log entries.

export_for_viewing() Sequence[ExportedSpan][source]

Converts the trace to a format that can be read by Pharia Studio.

The format is inspired by the OpenTelemetry Format, but does not abide by it. Specifically, it cuts away unused concepts, such as links.

Returns:

A list of spans which includes the current span and all its child spans.

span(name: str, timestamp: datetime | None = None) InMemorySpan[source]

Generate a span from the current span or logging instance.

Allows for grouping multiple logs and duration together as a single, logical step in the process.

Each tracer implementation can decide on how it wants to represent this, but they should all capture the hierarchical nature of nested spans, as well as the idea of the duration of the span.

Parameters:
  • name – A descriptive name of what this span will contain logs about.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a Span.

task_span(task_name: str, input: PydanticSerializable, timestamp: datetime | None = None) InMemoryTaskSpan[source]

Generate a task-specific span from the current span or logging instance.

Allows for grouping multiple logs together, as well as the task’s specific input, output, and duration.

Each tracer implementation can decide on how it wants to represent this, but they should all allow for representing logs of a span within the context of a parent span.

Parameters:
  • task_name – The name of the task that is being logged

  • input – The input for the task that is being logged.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a TaskSpan.

class Instruct(model: ControlModel)[source]

Bases: Task[InstructInput, CompleteOutput]

do_run(input: InstructInput, task_span: TaskSpan) CompleteOutput[source]

The implementation for this use case.

This takes an input and runs the implementation to generate an output. It takes a Span for tracing of the process. The Input and Output are logged by default.

Parameters:
  • input – Generic input defined by the task implementation

  • task_span – The Span used for tracing.

Returns:

Generic output defined by the task implementation.

run(input: Input, tracer: Tracer) Output

Executes the implementation of do_run for this use case.

This takes an input and runs the implementation to generate an output. It takes a Tracer for tracing of the process. The Input and Output are logged by default.

Parameters:
  • input – Generic input defined by the task implementation

  • tracer – The Tracer used for tracing.

Returns:

Generic output defined by the task implementation.

run_concurrently(inputs: Iterable[Input], tracer: Tracer, concurrency_limit: int = 20) Sequence[Output]

Executes multiple processes of this task concurrently.

Each provided input is potentially executed concurrently to the others. There is a global limit on the number of concurrently executed tasks that is shared by all tasks of all types.

Parameters:
  • inputs – The inputs that are potentially processed concurrently.

  • tracer – The tracer passed on the run method when executing a task.

  • concurrency_limit – An optional additional limit for the number of concurrently executed task for this method call. This can be used to prevent queue-full or similar error of downstream APIs when the global concurrency limit is too high for a certain task.

Returns:

The Outputs generated by calling run for each given Input. The order of Outputs corresponds to the order of the Inputs.

class InstructInput(*, instruction: str, input: str | None = None, response_prefix: str | None = None, maximum_tokens: int = 128)[source]

Bases: BaseModel

class JsonSerializer(root: RootModelRootType = PydanticUndefined)[source]

Bases: RootModel[PydanticSerializable]

classmethod model_construct(root: RootModelRootType, _fields_set: set[str] | None = None) Self

Create a new model using the provided root object and update fields set.

Parameters:
  • root – The root object of the model.

  • _fields_set – The set of fields to be updated.

Returns:

The new model.

Raises:

NotImplemented – If the model is not a subclass of RootModel.

class Language(iso_639_1: str)[source]

Bases: object

A language identified by its ISO 639-1 code.

get_name() str | None[source]
language_config(configs: Mapping[Language, Config]) Config[source]
to_lingua_language() Language[source]
class LanguageModel(name: str)[source]

Bases: ABC

Abstract base class to implement any LLM.

abstract echo(prompt: str, expected_completion: str, tracer: Tracer) Sequence[tuple[Any, float | None]][source]

Echos the log probs for each token of an expected completion given a prompt.

Parameters:
  • prompt – The prompt to echo

  • expected_completion – The expected completion to get log probs for

  • tracer – Valid instance of a tracer

Returns:

A list of tuples with token identifier and log probability

abstract generate(prompt: str, tracer: Tracer) str[source]

A completion function that takes a prompt and generates a completion.

Parameters:
  • prompt – The prompt to generate a completion for

  • tracer – Valid instance of a tracer

Returns:

An LLM completion

class Llama2InstructModel(name: str = 'llama-2-13b-chat', client: AlephAlphaClientProtocol | None = None)[source]

Bases: ControlModel

A llama-2-*-chat model, prompt-optimized for single-turn instructions.

If possible, we recommend using Llama3InstructModel instead.

Parameters:
  • name – The name of a valid llama-2 model. Defaults to llama-2-13b-chat

  • client – Aleph Alpha client instance for running model related API calls. Defaults to LimitedConcurrencyClient

complete(input: CompleteInput, tracer: Tracer) CompleteOutput
complete_task() Task[CompleteInput, CompleteOutput]
echo(prompt: str, expected_completion: str, tracer: Tracer) Sequence[tuple[Token, float | None]]

Echos the log probs for each token of an expected completion given a prompt.

Parameters:
  • prompt – The prompt to echo

  • expected_completion – The expected completion to get log probs for

  • tracer – Valid instance of a tracer

Returns:

A list of tuples with token identifier and log probability

explain(input: ExplainInput, tracer: Tracer) ExplainOutput
generate(prompt: str, tracer: Tracer) str

A completion function that takes a prompt and generates a completion.

Parameters:
  • prompt – The prompt to generate a completion for

  • tracer – Valid instance of a tracer

Returns:

An LLM completion

get_tokenizer() Tokenizer
get_tokenizer_no_whitespace_prefix() Tokenizer
to_instruct_prompt(instruction: str, input: str | None = None, response_prefix: str | None = None, instruction_controls: Sequence[TextControl] | None = None, input_controls: Sequence[TextControl] | None = None) RichPrompt

Method to create an instruct-RichPrompt object to use with any ControlModel.

Parameters:
  • instruction – The task the model should fulfill, for example summarization

  • input – Any context necessary to solve the task, such as the text to be summarize

  • response_prefix – Optional argument to append a string to the beginning of the final agent message to steer the generation

  • instruction_controls – TextControls for the instruction part of the prompt. Only for text prompts.

  • input_controls – TextControls for the input part of the prompt. Only for text prompts.

Returns:

The rendered prompt with all variables filled in.

tokenize(text: str, whitespace_prefix: bool = True) Encoding
class Llama3ChatModel(name: str = 'llama-3.1-8b-instruct', client: AlephAlphaClientProtocol | None = None)[source]

Bases: AlephAlphaChatModel

Chat model to be used for llama-3-* and llama-3.1-* models.

Parameters:
  • name – The name of a valid llama-3 model. Defaults to llama-3-8b-instruct

  • client – Aleph Alpha client instance for running model related API calls. Defaults to LimitedConcurrencyClient

complete(input: CompleteInput, tracer: Tracer) CompleteOutput
complete_task() Task[CompleteInput, CompleteOutput]
echo(prompt: str, expected_completion: str, tracer: Tracer) Sequence[tuple[Token, float | None]]

Echos the log probs for each token of an expected completion given a prompt.

Parameters:
  • prompt – The prompt to echo

  • expected_completion – The expected completion to get log probs for

  • tracer – Valid instance of a tracer

Returns:

A list of tuples with token identifier and log probability

echo_chat(messages: list[Message], response_prefix: str | None, expected_completion: str, tracer: Tracer) Sequence[tuple[Any, float | None]]

Echos the log probs for each token of an expected completion given a prompt.

Parameters:
  • messages – The messages to be used as prompt

  • response_prefix – Append the given string to the beginning of the final agent message to steer the generation.

  • expected_completion – The expected completion to get log probs for

  • tracer – Valid instance of a tracer

Returns:

A list of tuples with token identifier and log probability

explain(input: ExplainInput, tracer: Tracer) ExplainOutput
generate(prompt: str, tracer: Tracer) str

A completion function that takes a prompt and generates a completion.

Parameters:
  • prompt – The prompt to generate a completion for

  • tracer – Valid instance of a tracer

Returns:

An LLM completion

generate_chat(messages: Sequence[Message], response_prefix: str | None, tracer: Tracer) str

Generate a raw completion to messages for any AlephAlphaChatModel.

Parameters:
  • messages – A number of messages to use as prompt for the model

  • response_prefix – Optional argument to append a string to the beginning of the final agent message to steer the generation

  • tracer – Valid instance of a tracer

Returns:

An LLM completion

get_tokenizer() Tokenizer
get_tokenizer_no_whitespace_prefix() Tokenizer
to_chat_prompt(messages: Sequence[Message], response_prefix: str | None = None) RichPrompt

Method to create a chat-RichPrompt object to use with any AlephAlphaModel.

Parameters:
  • messages – A number of messages to use as prompt for the model

  • response_prefix – Append the given string to the beginning of the final agent message to steer the generation. Defaults to None.

Returns:

A RichPrompt object to be consumed by the Aleph Alpha client

to_finetuning_sample(messages: Sequence[Message]) Sequence[FinetuningMessage][source]

Abstract function allowing a user to define what the model’s finetuning samples should look like.

Parameters:

messages – The messages making up the finetuning sample

Returns:

A finetuning sample containing the input messages

to_instruct_prompt(instruction: str, input: str | None = None, response_prefix: str | None = None, instruction_controls: Sequence[TextControl] | None = None, input_controls: Sequence[TextControl] | None = None) RichPrompt

Method to use a chat model like an instruct model`.

Parameters:
  • instruction – The task the model should fulfill, for example summarization

  • input – Any context necessary to solve the task, such as the text to be summarized

  • response_prefix – Optional argument to append a string to the beginning of the final agent message to steer the generation

  • instruction_controls – Instruction controls are not used but needed for the interface.

  • input_controls – Input controls are not used but needed for the interface

Returns:

The rendered prompt with all variables filled in.

tokenize(text: str, whitespace_prefix: bool = True) Encoding
class Llama3InstructModel(name: str = 'llama-3.1-8b-instruct', client: AlephAlphaClientProtocol | None = None)[source]

Bases: ControlModel

A llama-3-*-instruct model.

Parameters:
  • name – The name of a valid llama-3 model. Defaults to llama-3.1-8b-instruct

  • client – Aleph Alpha client instance for running model related API calls. Defaults to LimitedConcurrencyClient

complete(input: CompleteInput, tracer: Tracer) CompleteOutput
complete_task() Task[CompleteInput, CompleteOutput]
echo(prompt: str, expected_completion: str, tracer: Tracer) Sequence[tuple[Token, float | None]]

Echos the log probs for each token of an expected completion given a prompt.

Parameters:
  • prompt – The prompt to echo

  • expected_completion – The expected completion to get log probs for

  • tracer – Valid instance of a tracer

Returns:

A list of tuples with token identifier and log probability

explain(input: ExplainInput, tracer: Tracer) ExplainOutput
generate(prompt: str, tracer: Tracer) str

A completion function that takes a prompt and generates a completion.

Parameters:
  • prompt – The prompt to generate a completion for

  • tracer – Valid instance of a tracer

Returns:

An LLM completion

get_tokenizer() Tokenizer
get_tokenizer_no_whitespace_prefix() Tokenizer
to_instruct_prompt(instruction: str, input: str | None = None, response_prefix: str | None = None, instruction_controls: Sequence[TextControl] | None = None, input_controls: Sequence[TextControl] | None = None) RichPrompt

Method to create an instruct-RichPrompt object to use with any ControlModel.

Parameters:
  • instruction – The task the model should fulfill, for example summarization

  • input – Any context necessary to solve the task, such as the text to be summarize

  • response_prefix – Optional argument to append a string to the beginning of the final agent message to steer the generation

  • instruction_controls – TextControls for the instruction part of the prompt. Only for text prompts.

  • input_controls – TextControls for the input part of the prompt. Only for text prompts.

Returns:

The rendered prompt with all variables filled in.

tokenize(text: str, whitespace_prefix: bool = True) Encoding
class LogEntry(*, message: str, value: ~pharia_inference_sdk.core.tracer.tracer.Annotated[PydanticSerializable, ~pydantic.functional_serializers.SerializeAsAny()], timestamp: ~datetime.datetime = <factory>, trace_id: ~uuid.UUID)[source]

Bases: BaseModel

An individual log entry, currently used to represent individual logs by the InMemoryTracer.

message

A description of the value you are logging, such as the step in the task this is related to.

Type:

str

value

The relevant data you want to log. Can be anything that is serializable by Pydantic, which gives the tracers flexibility in how they store and emit the logs.

Type:

PydanticSerializable

timestamp

The time that the log was emitted.

Type:

datetime.datetime

id

The ID of the trace to which this log entry belongs.

class LogLine(*, trace_id: UUID, entry_type: str, entry: Annotated[PydanticSerializable, SerializeAsAny()])[source]

Bases: BaseModel

Represents a complete log-line.

entry_type

The type of the entry. This is the class-name of one of the classes representing a log-entry (e.g. “StartTask”).

Type:

str

entry

The actual entry.

Type:

PydanticSerializable

class LuminousControlModel(name: str = 'luminous-base-control', client: AlephAlphaClientProtocol | None = None)[source]

Bases: ControlModel

An Aleph Alpha control model of the second generation.

Parameters:
  • name – The name of a valid model second generation control model. Defaults to luminous-base-control

  • client – Aleph Alpha client instance for running model related API calls. Defaults to LimitedConcurrencyClient

complete(input: CompleteInput, tracer: Tracer) CompleteOutput
complete_task() Task[CompleteInput, CompleteOutput]
echo(prompt: str, expected_completion: str, tracer: Tracer) Sequence[tuple[Token, float | None]]

Echos the log probs for each token of an expected completion given a prompt.

Parameters:
  • prompt – The prompt to echo

  • expected_completion – The expected completion to get log probs for

  • tracer – Valid instance of a tracer

Returns:

A list of tuples with token identifier and log probability

explain(input: ExplainInput, tracer: Tracer) ExplainOutput
generate(prompt: str, tracer: Tracer) str

A completion function that takes a prompt and generates a completion.

Parameters:
  • prompt – The prompt to generate a completion for

  • tracer – Valid instance of a tracer

Returns:

An LLM completion

get_tokenizer() Tokenizer
get_tokenizer_no_whitespace_prefix() Tokenizer
to_instruct_prompt(instruction: str, input: str | None = None, response_prefix: str | None = None, instruction_controls: Sequence[TextControl] | None = None, input_controls: Sequence[TextControl] | None = None) RichPrompt

Method to create an instruct-RichPrompt object to use with any ControlModel.

Parameters:
  • instruction – The task the model should fulfill, for example summarization

  • input – Any context necessary to solve the task, such as the text to be summarize

  • response_prefix – Optional argument to append a string to the beginning of the final agent message to steer the generation

  • instruction_controls – TextControls for the instruction part of the prompt. Only for text prompts.

  • input_controls – TextControls for the input part of the prompt. Only for text prompts.

Returns:

The rendered prompt with all variables filled in.

tokenize(text: str, whitespace_prefix: bool = True) Encoding
class Message(*, role: Literal['system', 'user', 'assistant'], content: str)[source]

Bases: BaseModel

class NoOpTracer(context: Context | None = None)[source]

Bases: TaskSpan

A no-op tracer.

Useful for cases, like testing, where a tracer is needed for a task, but you don’t have a need to collect or inspect the actual logs.

All calls to log won’t actually do anything.

end(timestamp: datetime | None = None) None[source]

Marks the Span as closed and sets the end time.

The Span should be regarded as complete, and no further logging should happen with it.

Ending a closed span in undefined behavior.

Parameters:

timestamp – Optional override of the timestamp. Defaults to call time.

export_for_viewing() Sequence[ExportedSpan][source]

Converts the trace to a format that can be read by Pharia Studio.

The format is inspired by the OpenTelemetry Format, but does not abide by it. Specifically, it cuts away unused concepts, such as links.

Returns:

A list of spans which includes the current span and all its child spans.

id() str[source]
log(message: str, value: PydanticSerializable, timestamp: datetime | None = None) None[source]

Record a log of relevant information as part of a step within a task.

By default, the Input and Output of each Task are logged automatically, but you can log anything else that seems relevant to understanding the process of a given task.

Logging to closed spans is undefined behavior.

Parameters:
  • message – A description of the value you are logging, such as the step in the task this is related to.

  • value – The relevant data you want to log. Can be anything that is serializable by Pydantic, which gives the tracers flexibility in how they store and emit the logs.

  • timestamp – optional override of the timestamp. Otherwise should default to now

record_output(output: PydanticSerializable) None[source]

Record Task output.

Since a Context Manager can’t capture output in the __exit__ method, output should be captured once it is generated.

Parameters:

output – The output of the task that is being logged.

span(name: str, timestamp: datetime | None = None) NoOpTracer[source]

Generate a span from the current span or logging instance.

Allows for grouping multiple logs and duration together as a single, logical step in the process.

Each tracer implementation can decide on how it wants to represent this, but they should all capture the hierarchical nature of nested spans, as well as the idea of the duration of the span.

Parameters:
  • name – A descriptive name of what this span will contain logs about.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a Span.

task_span(task_name: str, input: PydanticSerializable, timestamp: datetime | None = None) NoOpTracer[source]

Generate a task-specific span from the current span or logging instance.

Allows for grouping multiple logs together, as well as the task’s specific input, output, and duration.

Each tracer implementation can decide on how it wants to represent this, but they should all allow for representing logs of a span within the context of a parent span.

Parameters:
  • task_name – The name of the task that is being logged

  • input – The input for the task that is being logged.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a TaskSpan.

class OpenTelemetryTracer(tracer: Tracer)[source]

Bases: Tracer

A Tracer that uses open telemetry.

export_for_viewing() Sequence[ExportedSpan][source]

Converts the trace to a format that can be read by Pharia Studio.

The format is inspired by the OpenTelemetry Format, but does not abide by it. Specifically, it cuts away unused concepts, such as links.

Returns:

A list of spans which includes the current span and all its child spans.

span(name: str, timestamp: datetime | None = None) OpenTelemetrySpan[source]

Generate a span from the current span or logging instance.

Allows for grouping multiple logs and duration together as a single, logical step in the process.

Each tracer implementation can decide on how it wants to represent this, but they should all capture the hierarchical nature of nested spans, as well as the idea of the duration of the span.

Parameters:
  • name – A descriptive name of what this span will contain logs about.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a Span.

task_span(task_name: str, input: PydanticSerializable, timestamp: datetime | None = None) OpenTelemetryTaskSpan[source]

Generate a task-specific span from the current span or logging instance.

Allows for grouping multiple logs together, as well as the task’s specific input, output, and duration.

Each tracer implementation can decide on how it wants to represent this, but they should all allow for representing logs of a span within the context of a parent span.

Parameters:
  • task_name – The name of the task that is being logged

  • input – The input for the task that is being logged.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a TaskSpan.

class PersistentSpan(context: Context | None = None)[source]

Bases: Span, PersistentTracer, ABC

end(timestamp: datetime | None = None) None[source]

Marks the Span as closed and sets the end time.

The Span should be regarded as complete, and no further logging should happen with it.

Ending a closed span in undefined behavior.

Parameters:

timestamp – Optional override of the timestamp. Defaults to call time.

export_for_viewing() Sequence[ExportedSpan]

Converts the trace to a format that can be read by Pharia Studio.

The format is inspired by the OpenTelemetry Format, but does not abide by it. Specifically, it cuts away unused concepts, such as links.

Returns:

A list of spans which includes the current span and all its child spans.

log(message: str, value: PydanticSerializable, timestamp: datetime | None = None) None[source]

Record a log of relevant information as part of a step within a task.

By default, the Input and Output of each Task are logged automatically, but you can log anything else that seems relevant to understanding the process of a given task.

Logging to closed spans is undefined behavior.

Parameters:
  • message – A description of the value you are logging, such as the step in the task this is related to.

  • value – The relevant data you want to log. Can be anything that is serializable by Pydantic, which gives the tracers flexibility in how they store and emit the logs.

  • timestamp – optional override of the timestamp. Otherwise should default to now

abstract span(name: str, timestamp: datetime | None = None) Span

Generate a span from the current span or logging instance.

Allows for grouping multiple logs and duration together as a single, logical step in the process.

Each tracer implementation can decide on how it wants to represent this, but they should all capture the hierarchical nature of nested spans, as well as the idea of the duration of the span.

Parameters:
  • name – A descriptive name of what this span will contain logs about.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a Span.

abstract task_span(task_name: str, input: PydanticSerializable, timestamp: datetime | None = None) TaskSpan

Generate a task-specific span from the current span or logging instance.

Allows for grouping multiple logs together, as well as the task’s specific input, output, and duration.

Each tracer implementation can decide on how it wants to represent this, but they should all allow for representing logs of a span within the context of a parent span.

Parameters:
  • task_name – The name of the task that is being logged

  • input – The input for the task that is being logged.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a TaskSpan.

abstract traces() InMemoryTracer

Returns all traces of the given tracer.

Returns:

An InMemoryTracer that contains all traces of the tracer.

class PersistentTaskSpan(context: Context | None = None)[source]

Bases: TaskSpan, PersistentSpan, ABC

end(timestamp: datetime | None = None) None[source]

Marks the Span as closed and sets the end time.

The Span should be regarded as complete, and no further logging should happen with it.

Ending a closed span in undefined behavior.

Parameters:

timestamp – Optional override of the timestamp. Defaults to call time.

export_for_viewing() Sequence[ExportedSpan]

Converts the trace to a format that can be read by Pharia Studio.

The format is inspired by the OpenTelemetry Format, but does not abide by it. Specifically, it cuts away unused concepts, such as links.

Returns:

A list of spans which includes the current span and all its child spans.

log(message: str, value: PydanticSerializable, timestamp: datetime | None = None) None

Record a log of relevant information as part of a step within a task.

By default, the Input and Output of each Task are logged automatically, but you can log anything else that seems relevant to understanding the process of a given task.

Logging to closed spans is undefined behavior.

Parameters:
  • message – A description of the value you are logging, such as the step in the task this is related to.

  • value – The relevant data you want to log. Can be anything that is serializable by Pydantic, which gives the tracers flexibility in how they store and emit the logs.

  • timestamp – optional override of the timestamp. Otherwise should default to now

record_output(output: PydanticSerializable) None[source]

Record Task output.

Since a Context Manager can’t capture output in the __exit__ method, output should be captured once it is generated.

Parameters:

output – The output of the task that is being logged.

abstract span(name: str, timestamp: datetime | None = None) Span

Generate a span from the current span or logging instance.

Allows for grouping multiple logs and duration together as a single, logical step in the process.

Each tracer implementation can decide on how it wants to represent this, but they should all capture the hierarchical nature of nested spans, as well as the idea of the duration of the span.

Parameters:
  • name – A descriptive name of what this span will contain logs about.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a Span.

abstract task_span(task_name: str, input: PydanticSerializable, timestamp: datetime | None = None) TaskSpan

Generate a task-specific span from the current span or logging instance.

Allows for grouping multiple logs together, as well as the task’s specific input, output, and duration.

Each tracer implementation can decide on how it wants to represent this, but they should all allow for representing logs of a span within the context of a parent span.

Parameters:
  • task_name – The name of the task that is being logged

  • input – The input for the task that is being logged.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a TaskSpan.

abstract traces() InMemoryTracer

Returns all traces of the given tracer.

Returns:

An InMemoryTracer that contains all traces of the tracer.

class PersistentTracer[source]

Bases: Tracer, ABC

export_for_viewing() Sequence[ExportedSpan][source]

Converts the trace to a format that can be read by Pharia Studio.

The format is inspired by the OpenTelemetry Format, but does not abide by it. Specifically, it cuts away unused concepts, such as links.

Returns:

A list of spans which includes the current span and all its child spans.

abstract span(name: str, timestamp: datetime | None = None) Span

Generate a span from the current span or logging instance.

Allows for grouping multiple logs and duration together as a single, logical step in the process.

Each tracer implementation can decide on how it wants to represent this, but they should all capture the hierarchical nature of nested spans, as well as the idea of the duration of the span.

Parameters:
  • name – A descriptive name of what this span will contain logs about.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a Span.

abstract task_span(task_name: str, input: PydanticSerializable, timestamp: datetime | None = None) TaskSpan

Generate a task-specific span from the current span or logging instance.

Allows for grouping multiple logs together, as well as the task’s specific input, output, and duration.

Each tracer implementation can decide on how it wants to represent this, but they should all allow for representing logs of a span within the context of a parent span.

Parameters:
  • task_name – The name of the task that is being logged

  • input – The input for the task that is being logged.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a TaskSpan.

abstract traces() InMemoryTracer[source]

Returns all traces of the given tracer.

Returns:

An InMemoryTracer that contains all traces of the tracer.

class Pharia1ChatModel(name: str = 'pharia-1-llm-7b-control', client: AlephAlphaClientProtocol | None = None)[source]

Bases: AlephAlphaChatModel

Chat model to be used for any “pharia-1-llm-* model.

Parameters:
  • name – The name of a valid Pharia-1 model. Defaults to pharia-1-llm-7b-control

  • client – Aleph Alpha client instance for running model related API calls. Defaults to LimitedConcurrencyClient

complete(input: CompleteInput, tracer: Tracer) CompleteOutput[source]
complete_task() Task[CompleteInput, CompleteOutput]
echo(prompt: str, expected_completion: str, tracer: Tracer) Sequence[tuple[Token, float | None]]

Echos the log probs for each token of an expected completion given a prompt.

Parameters:
  • prompt – The prompt to echo

  • expected_completion – The expected completion to get log probs for

  • tracer – Valid instance of a tracer

Returns:

A list of tuples with token identifier and log probability

echo_chat(messages: list[Message], response_prefix: str | None, expected_completion: str, tracer: Tracer) Sequence[tuple[Any, float | None]]

Echos the log probs for each token of an expected completion given a prompt.

Parameters:
  • messages – The messages to be used as prompt

  • response_prefix – Append the given string to the beginning of the final agent message to steer the generation.

  • expected_completion – The expected completion to get log probs for

  • tracer – Valid instance of a tracer

Returns:

A list of tuples with token identifier and log probability

explain(input: ExplainInput, tracer: Tracer) ExplainOutput
generate(prompt: str, tracer: Tracer) str

A completion function that takes a prompt and generates a completion.

Parameters:
  • prompt – The prompt to generate a completion for

  • tracer – Valid instance of a tracer

Returns:

An LLM completion

generate_chat(messages: Sequence[Message], response_prefix: str | None, tracer: Tracer) str

Generate a raw completion to messages for any AlephAlphaChatModel.

Parameters:
  • messages – A number of messages to use as prompt for the model

  • response_prefix – Optional argument to append a string to the beginning of the final agent message to steer the generation

  • tracer – Valid instance of a tracer

Returns:

An LLM completion

get_tokenizer() Tokenizer
get_tokenizer_no_whitespace_prefix() Tokenizer
to_chat_prompt(messages: Sequence[Message], response_prefix: str | None = None) RichPrompt

Method to create a chat-RichPrompt object to use with any AlephAlphaModel.

Parameters:
  • messages – A number of messages to use as prompt for the model

  • response_prefix – Append the given string to the beginning of the final agent message to steer the generation. Defaults to None.

Returns:

A RichPrompt object to be consumed by the Aleph Alpha client

to_finetuning_sample(messages: Sequence[Message]) Sequence[FinetuningMessage][source]

Abstract function allowing a user to define what the model’s finetuning samples should look like.

Parameters:

messages – The messages making up the finetuning sample

Returns:

A finetuning sample containing the input messages

to_instruct_prompt(instruction: str, input: str | None = None, response_prefix: str | None = None, instruction_controls: Sequence[TextControl] | None = None, input_controls: Sequence[TextControl] | None = None) RichPrompt

Method to use a chat model like an instruct model`.

Parameters:
  • instruction – The task the model should fulfill, for example summarization

  • input – Any context necessary to solve the task, such as the text to be summarized

  • response_prefix – Optional argument to append a string to the beginning of the final agent message to steer the generation

  • instruction_controls – Instruction controls are not used but needed for the interface.

  • input_controls – Input controls are not used but needed for the interface

Returns:

The rendered prompt with all variables filled in.

tokenize(text: str, whitespace_prefix: bool = True) Encoding
class PlainEntry(*, message: str, value: Annotated[PydanticSerializable, SerializeAsAny()], timestamp: datetime, parent: UUID, trace_id: UUID)[source]

Bases: BaseModel

Represents a plain log-entry created through Tracer.log.

message

the message-parameter of Tracer.log

Type:

str

value

the value-parameter of Tracer.log

Type:

PydanticSerializable

timestamp

the timestamp when Tracer.log was called.

Type:

datetime.datetime

parent

The unique id of the parent element of the log. This could refer to either a surrounding TaskSpan, Span or the top-level Tracer.

Type:

uuid.UUID

trace_id

The ID of the trace this entry belongs to.

Type:

uuid.UUID

class PromptItemCursor(item: int)[source]

Bases: object

Defines a position with a non-Text prompt item.

Parameters:

item – the index of the prompt item within the Prompt

class PromptRange(start: TextCursor | PromptItemCursor, end: TextCursor | PromptItemCursor)[source]

Bases: object

Defines a range within a Prompt.

class PromptTemplate(template_str: str)[source]

Bases: object

Allows to build a Prompt using the liquid template language.

To add non-text prompt items first you have to save it to the template with the template.placeholder() function. To embed the items in the template, pass the placeholder in the place(s) where you would like the items.

Example

>>> from aleph_alpha_client import CompletionRequest, Tokens
>>> from core import PromptTemplate
>>> tokens = Tokens.from_token_ids([1, 2, 3])
>>> template = PromptTemplate(
...     '''{%- for name in names -%}
...     Hello {{name}}!
...     {% endfor -%}
...     {{ image }}
...     ''')
>>> placeholder = template.placeholder(tokens)
>>> names = ["World", "Rutger"]
>>> prompt = template.to_rich_prompt(names=names, image=placeholder)
>>> request = CompletionRequest(prompt=prompt)
embed_prompt(prompt: Prompt) str[source]

Embeds a prompt in a prompt template.

Adds whitespace between text items if there is no whitespace between them. In case of non-text prompt items, this embeds them into the end result.

Parameters:

prompt – prompt to embed in the template

Example

>>> from aleph_alpha_client import Prompt, Text, Tokens
>>> from core import PromptTemplate
>>> user_prompt = Prompt([
... Tokens.from_token_ids([1, 2, 3]),
... Text.from_text("cool"),
... ])
>>> template = PromptTemplate("Question: {{user_prompt}}\n Answer: ")
>>> prompt = template.to_rich_prompt(user_prompt=template.embed_prompt(user_prompt))
Returns:

The prompt template with the embedded prompt.

placeholder(value: Image | Tokens) Placeholder[source]

Saves a non-text prompt item to the template and returns a placeholder.

The placeholder is used to embed the prompt item in the template

Parameters:

value – Tokens to store

Returns:

A placeholder for the given non-text item.

to_rich_prompt(**kwargs: Any) RichPrompt[source]

Creates a Prompt along with metadata from the template string and the given parameters.

Currently, the only metadata returned is information about ranges that are marked in the template. Provided parameters are passed to liquid.Template.render.

Parameters:

**kwargs – Parameters to enrich prompt with

Returns:

The rendered prompt as a RichPrompt

class RichPrompt(items: ~typing.Sequence[~aleph_alpha_client.prompt.Text | ~aleph_alpha_client.prompt.Tokens | ~aleph_alpha_client.prompt.Image], ranges: ~collections.abc.Mapping[str, ~collections.abc.Sequence[~pharia_inference_sdk.core.prompt_template.PromptRange]] = <factory>)[source]

Bases: Prompt

The Prompt along with some metadata generated when a PromptTemplate is turned into a Prompt.

Parameters:

ranges – A mapping of range name to a Sequence of corresponding PromptRange instances.

static from_image(image: Image) Prompt
static from_json(items_json: Sequence[Mapping[str, Any]]) Prompt
static from_text(text: str, controls: Sequence[TextControl] | None = None) Prompt
static from_tokens(tokens: Sequence[int], controls: Sequence[TokenControl] | None = None) Prompt

Examples

>>> prompt = Prompt.from_tokens([1, 2, 3])
to_json() Sequence[Mapping[str, Any]]
class ScoredTextHighlight(*, start: int, end: int, score: float)[source]

Bases: BaseModel

A substring of the input prompt scored for relevance with regard to the output.

start

The start index of the highlight.

Type:

int

end

The end index of the highlight.

Type:

int

score

The score of the highlight. Normalized to be between zero and one, with higher being more important.

Type:

float

class Span(context: Context | None = None)[source]

Bases: Tracer, AbstractContextManager[Span]

Captures a logical step within the overall workflow.

Logs and other spans can be nested underneath.

Can also be used as a context manager to easily capture the start and end time, and keep the span only in scope while it is open.

context

The context of the current span. If the span is a root span, the trace id will be equal to its span id.

Type:

pharia_inference_sdk.core.tracer.tracer.Context

status_code

Status of the span. Will be “OK” unless the span was interrupted by an exception.

abstract end(timestamp: datetime | None = None) None[source]

Marks the Span as closed and sets the end time.

The Span should be regarded as complete, and no further logging should happen with it.

Ending a closed span in undefined behavior.

Parameters:

timestamp – Optional override of the timestamp. Defaults to call time.

abstract export_for_viewing() Sequence[ExportedSpan]

Converts the trace to a format that can be read by Pharia Studio.

The format is inspired by the OpenTelemetry Format, but does not abide by it. Specifically, it cuts away unused concepts, such as links.

Returns:

A list of spans which includes the current span and all its child spans.

abstract log(message: str, value: PydanticSerializable, timestamp: datetime | None = None) None[source]

Record a log of relevant information as part of a step within a task.

By default, the Input and Output of each Task are logged automatically, but you can log anything else that seems relevant to understanding the process of a given task.

Logging to closed spans is undefined behavior.

Parameters:
  • message – A description of the value you are logging, such as the step in the task this is related to.

  • value – The relevant data you want to log. Can be anything that is serializable by Pydantic, which gives the tracers flexibility in how they store and emit the logs.

  • timestamp – optional override of the timestamp. Otherwise should default to now

abstract span(name: str, timestamp: datetime | None = None) Span

Generate a span from the current span or logging instance.

Allows for grouping multiple logs and duration together as a single, logical step in the process.

Each tracer implementation can decide on how it wants to represent this, but they should all capture the hierarchical nature of nested spans, as well as the idea of the duration of the span.

Parameters:
  • name – A descriptive name of what this span will contain logs about.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a Span.

abstract task_span(task_name: str, input: PydanticSerializable, timestamp: datetime | None = None) TaskSpan

Generate a task-specific span from the current span or logging instance.

Allows for grouping multiple logs together, as well as the task’s specific input, output, and duration.

Each tracer implementation can decide on how it wants to represent this, but they should all allow for representing logs of a span within the context of a parent span.

Parameters:
  • task_name – The name of the task that is being logged

  • input – The input for the task that is being logged.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a TaskSpan.

class SpanAttributes(*, type: Literal[SpanType.SPAN] = SpanType.SPAN)[source]

Bases: BaseModel

class SpanStatus(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

class SpanType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

class StartSpan(*, uuid: UUID, parent: UUID, name: str, start: datetime, trace_id: UUID)[source]

Bases: BaseModel

Represents the payload/entry of a log-line indicating that a Span was opened through Tracer.span.

uuid

A unique id for the opened Span.

Type:

uuid.UUID

parent

The unique id of the parent element of opened TaskSpan. This could refer to either a surrounding TaskSpan, Span or the top-level Tracer.

Type:

uuid.UUID

name

The name of the task.

Type:

str

start

The timestamp when this Span was started.

Type:

datetime.datetime

trace_id

The ID of the trace this span belongs to.

Type:

uuid.UUID

class StartTask(*, uuid: UUID, parent: UUID, name: str, start: datetime, input: Annotated[PydanticSerializable, SerializeAsAny()], trace_id: UUID)[source]

Bases: BaseModel

Represents the payload/entry of a log-line indicating that a TaskSpan was opened through Tracer.task_span.

uuid

A unique id for the opened TaskSpan.

Type:

uuid.UUID

parent

The unique id of the parent element of opened TaskSpan. This could refer to either a surrounding TaskSpan, Span or the top-level Tracer.

Type:

uuid.UUID

name

The name of the task.

Type:

str

start

The timestamp when this Task was started (i.e. run was called).

Type:

datetime.datetime

input

The Input (i.e. parameter for run) the Task was started with.

Type:

PydanticSerializable

trace_id

The trace id of the opened TaskSpan.

Type:

uuid.UUID

class Task[source]

Bases: ABC, Generic[Input, Output]

Base task interface. This may consist of several sub-tasks to accomplish the given task.

Generics:
Input: Interface to be passed to the task with all data needed to run the process.

Ideally, these are specified in terms related to the use-case, rather than lower-level configuration options.

Output: Interface of the output returned by the task.

abstract do_run(input: Input, task_span: TaskSpan) Output[source]

The implementation for this use case.

This takes an input and runs the implementation to generate an output. It takes a Span for tracing of the process. The Input and Output are logged by default.

Parameters:
  • input – Generic input defined by the task implementation

  • task_span – The Span used for tracing.

Returns:

Generic output defined by the task implementation.

final run(input: Input, tracer: Tracer) Output[source]

Executes the implementation of do_run for this use case.

This takes an input and runs the implementation to generate an output. It takes a Tracer for tracing of the process. The Input and Output are logged by default.

Parameters:
  • input – Generic input defined by the task implementation

  • tracer – The Tracer used for tracing.

Returns:

Generic output defined by the task implementation.

final run_concurrently(inputs: Iterable[Input], tracer: Tracer, concurrency_limit: int = 20) Sequence[Output][source]

Executes multiple processes of this task concurrently.

Each provided input is potentially executed concurrently to the others. There is a global limit on the number of concurrently executed tasks that is shared by all tasks of all types.

Parameters:
  • inputs – The inputs that are potentially processed concurrently.

  • tracer – The tracer passed on the run method when executing a task.

  • concurrency_limit – An optional additional limit for the number of concurrently executed task for this method call. This can be used to prevent queue-full or similar error of downstream APIs when the global concurrency limit is too high for a certain task.

Returns:

The Outputs generated by calling run for each given Input. The order of Outputs corresponds to the order of the Inputs.

class TaskSpan(context: Context | None = None)[source]

Bases: Span

Specialized span for instrumenting Task input, output, and nested spans and logs.

Generating this TaskSpan should capture the Task input, as well as the task name.

Can also be used as a Context Manager to easily capture the start and end time of the task, and keep the span only in scope while it is active

abstract end(timestamp: datetime | None = None) None

Marks the Span as closed and sets the end time.

The Span should be regarded as complete, and no further logging should happen with it.

Ending a closed span in undefined behavior.

Parameters:

timestamp – Optional override of the timestamp. Defaults to call time.

abstract export_for_viewing() Sequence[ExportedSpan]

Converts the trace to a format that can be read by Pharia Studio.

The format is inspired by the OpenTelemetry Format, but does not abide by it. Specifically, it cuts away unused concepts, such as links.

Returns:

A list of spans which includes the current span and all its child spans.

abstract log(message: str, value: PydanticSerializable, timestamp: datetime | None = None) None

Record a log of relevant information as part of a step within a task.

By default, the Input and Output of each Task are logged automatically, but you can log anything else that seems relevant to understanding the process of a given task.

Logging to closed spans is undefined behavior.

Parameters:
  • message – A description of the value you are logging, such as the step in the task this is related to.

  • value – The relevant data you want to log. Can be anything that is serializable by Pydantic, which gives the tracers flexibility in how they store and emit the logs.

  • timestamp – optional override of the timestamp. Otherwise should default to now

abstract record_output(output: PydanticSerializable) None[source]

Record Task output.

Since a Context Manager can’t capture output in the __exit__ method, output should be captured once it is generated.

Parameters:

output – The output of the task that is being logged.

abstract span(name: str, timestamp: datetime | None = None) Span

Generate a span from the current span or logging instance.

Allows for grouping multiple logs and duration together as a single, logical step in the process.

Each tracer implementation can decide on how it wants to represent this, but they should all capture the hierarchical nature of nested spans, as well as the idea of the duration of the span.

Parameters:
  • name – A descriptive name of what this span will contain logs about.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a Span.

abstract task_span(task_name: str, input: PydanticSerializable, timestamp: datetime | None = None) TaskSpan

Generate a task-specific span from the current span or logging instance.

Allows for grouping multiple logs together, as well as the task’s specific input, output, and duration.

Each tracer implementation can decide on how it wants to represent this, but they should all allow for representing logs of a span within the context of a parent span.

Parameters:
  • task_name – The name of the task that is being logged

  • input – The input for the task that is being logged.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a TaskSpan.

class TaskSpanAttributes(*, type: Literal[SpanType.TASK_SPAN] = SpanType.TASK_SPAN, input: Annotated[PydanticSerializable, SerializeAsAny()], output: Annotated[PydanticSerializable, SerializeAsAny()])[source]

Bases: BaseModel

class TextCursor(item: int, position: int)[source]

Bases: object

Defines a position with a Text prompt item.

Parameters:
  • item – the index of the prompt item within the Prompt

  • position – the character position in the text of the item.

Example: >>> from aleph_alpha_client import Prompt >>> from core import TextCursor >>> prompt = Prompt.from_text(“This is a text”) >>> # This denotes the “i” in “is” in the text-item of the Prompt above >>> cursor = TextCursor(item=0, position=5)

class TextHighlight(model: AlephAlphaModel, granularity: PromptGranularity | None = None, threshold: float = 0.1, clamp: bool = False)[source]

Bases: Task[TextHighlightInput, TextHighlightOutput]

Generates text highlights given a prompt and completion.

For a given prompt and target (completion), extracts the parts of the prompt responsible for generation. The prompt can only contain text. A range can be provided via use of the liquid language (see the example). In this case, the highlights will only refer to text within this range.

Parameters:
  • model – The model used throughout the task for model related API calls.

  • granularity – At which granularity should the target be explained in terms of the prompt.

  • threshold – After normalization, everything highlight below this value will be dropped.

  • clamp – Control whether highlights should be clamped to a focus range if they intersect it.

Example

>>> import os
>>> from core import (
...     InMemoryTracer,
...     PromptTemplate,
...     TextHighlight,
...     TextHighlightInput,
...     AlephAlphaModel
... )
>>> model = AlephAlphaModel(name="luminous-base")
>>> text_highlight = TextHighlight(model=model)
>>> prompt_template_str = (
...             "{% promptrange r1 %}Question: What is 2 + 2?{% endpromptrange %}\nAnswer:"
...     )
>>> template = PromptTemplate(prompt_template_str)
>>> rich_prompt = template.to_rich_prompt()
>>> completion = " 4."
>>> model = "luminous-base"
>>> input = TextHighlightInput(
...         rich_prompt=rich_prompt, target=completion, focus_ranges=frozenset({"r1"})
... )
>>> output = text_highlight.run(input, InMemoryTracer())
do_run(input: TextHighlightInput, task_span: TaskSpan) TextHighlightOutput[source]

The implementation for this use case.

This takes an input and runs the implementation to generate an output. It takes a Span for tracing of the process. The Input and Output are logged by default.

Parameters:
  • input – Generic input defined by the task implementation

  • task_span – The Span used for tracing.

Returns:

Generic output defined by the task implementation.

run(input: Input, tracer: Tracer) Output

Executes the implementation of do_run for this use case.

This takes an input and runs the implementation to generate an output. It takes a Tracer for tracing of the process. The Input and Output are logged by default.

Parameters:
  • input – Generic input defined by the task implementation

  • tracer – The Tracer used for tracing.

Returns:

Generic output defined by the task implementation.

run_concurrently(inputs: Iterable[Input], tracer: Tracer, concurrency_limit: int = 20) Sequence[Output]

Executes multiple processes of this task concurrently.

Each provided input is potentially executed concurrently to the others. There is a global limit on the number of concurrently executed tasks that is shared by all tasks of all types.

Parameters:
  • inputs – The inputs that are potentially processed concurrently.

  • tracer – The tracer passed on the run method when executing a task.

  • concurrency_limit – An optional additional limit for the number of concurrently executed task for this method call. This can be used to prevent queue-full or similar error of downstream APIs when the global concurrency limit is too high for a certain task.

Returns:

The Outputs generated by calling run for each given Input. The order of Outputs corresponds to the order of the Inputs.

class TextHighlightInput(*, rich_prompt: RichPrompt, target: str, focus_ranges: frozenset[str] = frozenset({}))[source]

Bases: BaseModel

The input for a text highlighting task.

rich_prompt

From client’s PromptTemplate. Includes both the actual ‘Prompt’ as well as text range information. Supports liquid-template-language-style {% promptrange range_name %}/{% endpromptrange %} for range.

Type:

pharia_inference_sdk.core.prompt_template.RichPrompt

target

The target that should be explained. Expected to follow the prompt.

Type:

str

focus_ranges

The ranges contained in rich_prompt the returned highlights stem from. That means that each returned highlight overlaps with at least one character with one of the ranges listed here. If this set is empty highlights of the entire prompt are returned.

Type:

frozenset[str]

class TextHighlightOutput(*, highlights: Sequence[ScoredTextHighlight])[source]

Bases: BaseModel

The output of a text highlighting task.

highlights

A sequence of ‘ScoredTextHighlight’s.

Type:

collections.abc.Sequence[pharia_inference_sdk.core.text_highlight.ScoredTextHighlight]

class Token(*, token: str, token_id: int)[source]

Bases: BaseModel

A token class containing it’s id and the raw token.

This is used instead of the Aleph Alpha client Token class since this one is serializable, while the one from the client is not.

class TokenWithLogProb(*, token: Token, prob: LogProb)[source]

Bases: BaseModel

class Tracer[source]

Bases: ABC

Provides a consistent way to instrument a Task with logging for each step of the workflow.

A tracer needs to provide a way to collect an individual log, which should be serializable, and a way to generate nested spans, so that sub-tasks can emit logs that are grouped together.

Implementations of how logs are collected and stored may differ. Refer to the individual documentation of each implementation to see how to use the resulting tracer.

abstract export_for_viewing() Sequence[ExportedSpan][source]

Converts the trace to a format that can be read by Pharia Studio.

The format is inspired by the OpenTelemetry Format, but does not abide by it. Specifically, it cuts away unused concepts, such as links.

Returns:

A list of spans which includes the current span and all its child spans.

abstract span(name: str, timestamp: datetime | None = None) Span[source]

Generate a span from the current span or logging instance.

Allows for grouping multiple logs and duration together as a single, logical step in the process.

Each tracer implementation can decide on how it wants to represent this, but they should all capture the hierarchical nature of nested spans, as well as the idea of the duration of the span.

Parameters:
  • name – A descriptive name of what this span will contain logs about.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a Span.

abstract task_span(task_name: str, input: PydanticSerializable, timestamp: datetime | None = None) TaskSpan[source]

Generate a task-specific span from the current span or logging instance.

Allows for grouping multiple logs together, as well as the task’s specific input, output, and duration.

Each tracer implementation can decide on how it wants to represent this, but they should all allow for representing logs of a span within the context of a parent span.

Parameters:
  • task_name – The name of the task that is being logged

  • input – The input for the task that is being logged.

  • timestamp – Override of the starting timestamp. Defaults to call time.

Returns:

An instance of a TaskSpan.

exception TracerLogEntryFailed(error_message: str, id: str)[source]

Bases: Exception

add_note()

Exception.add_note(note) – add a note to the exception

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

utc_now() datetime[source]

Return datetime object with utc timezone.

datetime.utcnow() returns a datetime object without timezone, so this function is preferred.

Connectors Module

class AlephAlphaClientProtocol(*args, **kwargs)[source]

Bases: Protocol

batch_semantic_embed(request: BatchSemanticEmbeddingRequest, model: str | None = None) BatchSemanticEmbeddingResponse[source]
complete(request: CompletionRequest, model: str) CompletionResponse[source]
detokenize(request: DetokenizationRequest, model: str) DetokenizationResponse[source]
embed(request: EmbeddingRequest, model: str) EmbeddingResponse[source]
evaluate(request: EvaluationRequest, model: str) EvaluationResponse[source]
explain(request: ExplanationRequest, model: str) ExplanationResponse[source]
get_version() str[source]
models() Sequence[Mapping[str, Any]][source]
semantic_embed(request: SemanticEmbeddingRequest, model: str) SemanticEmbeddingResponse[source]
tokenize(request: TokenizationRequest, model: str) TokenizationResponse[source]
tokenizer(model: str) Tokenizer[source]
class LimitedConcurrencyClient(client: AlephAlphaClientProtocol, max_concurrency: int = 10, max_retry_time: int = 180)[source]

Bases: object

An Aleph Alpha Client wrapper that limits the number of concurrent requests.

This just delegates each call to the wrapped Aleph Alpha Client and ensures that never more than a given number of concurrent calls are executed against the API.

Parameters:
  • client – The wrapped Client.

  • max_concurrency – the maximal number of requests that may run concurrently against the API. Defaults to 10.

  • max_retry_time – the maximal time in seconds a complete is retried in case a BusyError is raised.

batch_semantic_embed(request: BatchSemanticEmbeddingRequest, model: str | None = None) BatchSemanticEmbeddingResponse[source]
complete(request: CompletionRequest, model: str) CompletionResponse[source]
detokenize(request: DetokenizationRequest, model: str) DetokenizationResponse[source]
embed(request: EmbeddingRequest, model: str) EmbeddingResponse[source]
evaluate(request: EvaluationRequest, model: str) EvaluationResponse[source]
explain(request: ExplanationRequest, model: str) ExplanationResponse[source]
classmethod from_env(token: str | None = None, host: str | None = None) LimitedConcurrencyClient[source]

This is a helper method to construct your client with default settings from a token and host.

Parameters:
  • token – An Aleph Alpha token to instantiate the client. If no token is provided, this method tries to fetch it from the environment under the name of “AA_TOKEN”.

  • host – The host that is used for requests. If no token is provided, this method tries to fetch it from the environment under the name of “CLIENT_URL”. If this is not present, it defaults to the Aleph Alpha Api. If you have an on premise setup, change this to your host URL.

Returns:

A LimitedConcurrencyClient

get_version() str[source]
models() Sequence[Mapping[str, Any]][source]
semantic_embed(request: SemanticEmbeddingRequest, model: str) SemanticEmbeddingResponse[source]
tokenize(request: TokenizationRequest, model: str) TokenizationResponse[source]
tokenizer(model: str) Tokenizer[source]