API Reference¶

Client Classes¶

Provides an asynchronous client for interacting with the Venice.ai API.

This client provides a complete interface for making asynchronous requests to all Venice AI API endpoints. It handles authentication, request formation, response parsing, and error management through a clean, resource-oriented design.

The client architecture follows a namespaced resource pattern, where different API capabilities are organized into dedicated resource objects (e.g., chat, models, image). This design creates a clean separation of concerns and makes the API more discoverable and easily navigable.

Parameters:

api_key (str) – Your Venice.ai API key. This is required for authentication.
base_url (Optional[Union[str, httpx.URL]]) – Overrides the default base URL. Defaults to the Venice AI production API URL. Useful for testing against different environments.
timeout (Optional[Union[float, httpx.Timeout]]) – Request timeout in seconds or as a detailed httpx.Timeout object for more granular control. Defaults to 60.0 seconds.
default_timeout (Optional[httpx.Timeout]) – Global default timeout for all API calls made by this client instance. If provided, this will be used as the default timeout for all requests unless overridden on a per-request basis. Takes precedence over the timeout parameter.
max_retries (int) – Maximum number of retries for connection errors or transient failures. This parameter controls the total number of retries for the httpx-retries mechanism. Defaults to 2.
retry_backoff_factor (float) – Backoff factor for retry delays. Defaults to 0.5.
retry_status_forcelist (Optional[List[int]]) – List of HTTP status codes to retry on. Defaults to [429, 500, 502, 503, 504].
retry_respect_retry_after_header (bool) – Whether to respect Retry-After headers. Defaults to True.
http_client (Optional[httpx.AsyncClient]) –
An optional pre-configured httpx.AsyncClient instance to use for HTTP requests. If provided:
- The SDK will use this custom client directly.
- The SDK will still configure base_url (from the base_url parameter or default), timeout (from default_timeout or timeout parameter), and Authorization headers on this provided client instance.
- All other HTTP-related parameters passed to this constructor (e.g., max_retries, retry_backoff_factor, proxy, transport, limits, verify, etc.) will be ignored. It is assumed that the provided http_client is already configured with these aspects.
- You are responsible for managing the lifecycle of the provided http_client (e.g., closing it via await http_client.aclose()).
If not provided, the SDK will create and manage its own internal httpx.AsyncClient.
proxy (Optional[Union[str, httpx.URL, httpx.Proxy]]) – Proxy configuration for HTTP requests. Only used when http_client is not provided.
transport (Optional[httpx.BaseTransport]) – Custom transport for HTTP requests. Only used when http_client is not provided.
limits (Optional[httpx.Limits]) – Connection limits configuration. Only used when http_client is not provided.
cert (Optional[Union[str, Tuple[str, str]]]) – Client certificate configuration. Only used when http_client is not provided.
verify (Optional[Union[bool, str, ssl.SSLContext]]) – SSL certificate verification. Only used when http_client is not provided.
trust_env (Optional[bool]) – Whether to trust environment variables for proxy configuration. Only used when http_client is not provided.
http1 (Optional[bool]) – Whether to enable HTTP/1.1. Only used when http_client is not provided.
http2 (Optional[bool]) – Whether to enable HTTP/2. Only used when http_client is not provided.
default_encoding (Optional[Union[str, Callable[[bytes], str]]]) – Default encoding for response content. Only used when http_client is not provided.
event_hooks (Optional[Mapping[str, List[Callable[..., Any]]]]) – Event hooks for request/response lifecycle. Only used when http_client is not provided.

chat

Access to chat-related endpoints.

Type:: AsyncChatResource

models

Access to model listing and information endpoints.

Type:: AsyncModels

image

Access to image generation and manipulation endpoints.

Type:: AsyncImage

audio

Access to speech synthesis and audio processing endpoints.

Type:: AsyncAudio

billing

Access to billing and usage information endpoints.

Type:: AsyncBilling

embeddings

Access to embedding generation endpoints.

Type:: AsyncEmbeddings

api_keys

Access to API key management endpoints.

Type:: AsyncApiKeys

characters

Access to character management endpoints.

Type:: AsyncCharacters

Examples

Basic usage:

from venice_ai import AsyncVeniceClient

async with AsyncVeniceClient(api_key="your-api-key") as client:
    response = await client.chat.completions.create(
        model="venice-1",
        messages=[{"role": "user", "content": "Hello, world!"}]
    )
    print(response["choices"][0]["message"]["content"])

Streaming example:

from venice_ai import AsyncVeniceClient

async with AsyncVeniceClient(api_key="your-api-key") as client:
    async for chunk in client.chat.completions.create(
        model="venice-1",
        messages=[{"role": "user", "content": "Count to 5"}],
        stream=True
    ):
        content = chunk["choices"][0]["delta"].get("content", "")
        if content:
            print(content, end="", flush=True)

Using with a custom httpx client:

import httpx
from venice_ai import AsyncVeniceClient

# Create a custom client with specific configurations
custom_client = httpx.AsyncClient(
    timeout=httpx.Timeout(connect=5.0, read=30.0, write=10.0),
    follow_redirects=True,
    http2=True
)

# Use the custom client with AsyncVeniceClient
async with AsyncVeniceClient(
    api_key="your-api-key",
    http_client=custom_client
) as client:
    # Your API operations here
    pass

Raises:: ValueError – If api_key is empty or None.

Note

When used as an async context manager (with async with), the client will automatically close the underlying HTTP client upon exit, freeing any resources. For manual resource management, use the close() method.

Parameters:

http_transport_options (typing.Optional[typing.Dict[str, typing.Any]])
async_transport (typing.Union[httpx.AsyncBaseTransport, venice_ai.utils.NotGivenType, None])
follow_redirects (typing.Union[bool, venice_ai.utils.NotGivenType, None])
max_redirects (typing.Union[int, venice_ai.utils.NotGivenType, None])

async aclose()[source]

Close the underlying asynchronous HTTP client and free all associated resources.

This is an alias for the close() method, following the conventional async naming pattern where async methods are prefixed with ‘a’. This method performs the same cleanup as close() - it closes the internal httpx.AsyncClient and any associated resources.

Returns:: None
Return type:: None

property api_key: str

Get the API key for authentication.

Returns the explicitly set API key, or falls back to the VENICE_API_KEY environment variable if no key was explicitly provided.

Returns:: The API key to use for authentication.
Return type:: str

build_request(method: str, path: str, *, json_data: Mapping[str, Any] | None = None, headers: Mapping[str, str] | None = None, params: Mapping[str, Any] | None = None)[source]

Build a request with proper headers including authentication.

This method constructs the headers for a request, merging authentication headers with any provided headers. It supports default token retention by using the current api_key value.

Parameters:

method (str) – HTTP method for the request.
path (str) – API endpoint path relative to the base URL.
json_data (Optional[Mapping[str, Any]]) – JSON-serializable request body.
headers (Optional[Mapping[str, str]]) – Additional HTTP headers to include.
params (Optional[Mapping[str, Any]]) – URL query parameters.

Returns:

Dictionary containing the built request information.

Return type:

Dict[str, Any]

async close()[source]

Close the underlying asynchronous HTTP client and free all associated resources.

This method performs cleanup of the internal httpx.AsyncClient and any associated resources such as connection pools, SSL contexts, and background tasks. It should be called when the Venice AI client is no longer needed to ensure proper resource cleanup and prevent resource leaks.

When using the client as an async context manager (with async with), this method is called automatically upon exiting the context, so manual cleanup is not required. For manual resource management, this method should be called explicitly.

The method is designed to be idempotent - it can be called multiple times safely. Only the first call will actually perform the cleanup; subsequent calls will be no-ops. This prevents errors if cleanup is attempted multiple times.

Note

If a user-provided httpx.AsyncClient was passed to the constructor, this method will not close it, as the user is responsible for managing the lifecycle of their own client.

After calling this method, the client should not be used for making further API requests. Attempting to use a closed client may result in errors or undefined behavior.

Return type:: None

async delete(path: str, *, cast_to: Type[venice_ai._async_client.T] | None = None, **kwargs)[source]

Make an asynchronous DELETE request to the specified API endpoint.

This is a convenience method that wraps the lower-level _request method specifically for DELETE requests. It handles proper header configuration for DELETE requests and provides a clean interface for resource deletion operations.

Parameters:

path (str) – API endpoint path relative to the client’s base URL. Should not include a leading slash as it will be properly joined with the base URL.
cast_to (Optional[Type[T]]) – Optional Pydantic model to cast the response to.
kwargs – Additional keyword arguments to pass to the underlying _request method. This can include options like headers, params, timeout, or raw_response.

Returns:

Parsed JSON response data as Python objects (typically dict or list). Many DELETE endpoints return confirmation data or the deleted resource details.

Return type:

Any

Raises:

venice_ai.exceptions.InvalidRequestError – For HTTP 400 errors indicating invalid request parameters.
venice_ai.exceptions.AuthenticationError – For HTTP 401 errors indicating invalid or missing API key.
venice_ai.exceptions.PermissionDeniedError – For HTTP 403 errors indicating insufficient permissions.
venice_ai.exceptions.NotFoundError – For HTTP 404 errors indicating the requested resource was not found.
venice_ai.exceptions.RateLimitError – For HTTP 429 errors indicating rate limit exceeded.
venice_ai.exceptions.InternalServerError – For HTTP 5xx errors indicating server-side problems.
venice_ai.exceptions.APITimeoutError – If the request times out before completion.
venice_ai.exceptions.APIConnectionError – For network connectivity issues or connection failures.
venice_ai.exceptions.APIError – For other API-related errors not covered by specific exceptions.

async get(path: str, *, params: Mapping[str, Any] | None = None, cast_to: Type[venice_ai._async_client.T] | None = None, **kwargs)[source]

Make an asynchronous GET request to the specified API endpoint.

This is a convenience method that wraps the lower-level _request method specifically for GET requests. It automatically handles proper header configuration for GET requests (removing Content-Type headers) and provides a clean interface for retrieving data from the API.

Parameters:

path (str) – API endpoint path relative to the client’s base URL. Should not include a leading slash as it will be properly joined with the base URL.
params (Optional[Mapping[str, Any]]) – URL query parameters to include in the request. These will be properly URL-encoded and appended to the request URL.
cast_to (Optional[Type[T]]) – Optional Pydantic model to cast the response to.
kwargs – Additional keyword arguments to pass to the underlying _request method. This can include options like headers, timeout, or raw_response.

Returns:

Parsed JSON response data as Python objects (typically dict or list).

Return type:

Any

Raises:

venice_ai.exceptions.InvalidRequestError – For HTTP 400 errors indicating invalid request parameters.
venice_ai.exceptions.AuthenticationError – For HTTP 401 errors indicating invalid or missing API key.
venice_ai.exceptions.PermissionDeniedError – For HTTP 403 errors indicating insufficient permissions.
venice_ai.exceptions.NotFoundError – For HTTP 404 errors indicating the requested resource was not found.
venice_ai.exceptions.RateLimitError – For HTTP 429 errors indicating rate limit exceeded.
venice_ai.exceptions.InternalServerError – For HTTP 5xx errors indicating server-side problems.
venice_ai.exceptions.APITimeoutError – If the request times out before completion.
venice_ai.exceptions.APIConnectionError – For network connectivity issues or connection failures.
venice_ai.exceptions.APIError – For other API-related errors not covered by specific exceptions.

async get_model_pricing(model_id: str)[source]

Get pricing information for a specific model.

Retrieves the pricing structure for a given model ID, including both USD and VCU (Venice Compute Units) costs for input and output tokens.

Parameters:: model_id (str) – The ID of the model to get pricing for
Returns:: Pricing information for the model
Return type:: ModelPricing
Raises:: ValueError – If the model is not found

Example

>>> async with AsyncVeniceClient(api_key="your-api-key") as client:
...     pricing = await client.get_model_pricing("llama-3.3-70b")
...     print(f"Input: ${pricing['input']['usd']}/1k tokens")
...     print(f"Output: ${pricing['output']['usd']}/1k tokens")

async post(path: str, *, json_data: Mapping[str, Any] | None = None, timeout: float | httpx.Timeout | None = None, cast_to: Type[venice_ai._async_client.T] | None = None, **kwargs)[source]

Make an asynchronous POST request to the specified API endpoint.

This is a convenience method that wraps the lower-level _request method specifically for POST requests. It handles JSON serialization of the request body and ensures proper Content-Type headers are set for JSON requests.

Parameters:

path (str) – API endpoint path relative to the client’s base URL. Should not include a leading slash as it will be properly joined with the base URL.
json_data (Optional[Mapping[str, Any]]) – JSON-serializable data to send in the request body. This will be automatically serialized to JSON and sent with Content-Type: application/json headers. Can include any data structure that is JSON-serializable (dict, list, primitives).
timeout (Optional[Union[float, httpx.Timeout]]) – Request timeout configuration. Can be a float specifying timeout in seconds, or an httpx.Timeout object for granular timeout control. If not provided, uses the client’s default timeout setting.
cast_to (Optional[Type[T]]) – Optional Pydantic model to cast the response to.
kwargs – Additional keyword arguments to pass to the underlying _request method. This can include options like headers, params, or raw_response.

Returns:

Parsed JSON response data as Python objects (typically dict or list).

Return type:

Any

Raises:

venice_ai.exceptions.InvalidRequestError – For HTTP 400 errors indicating invalid request parameters.
venice_ai.exceptions.AuthenticationError – For HTTP 401 errors indicating invalid or missing API key.
venice_ai.exceptions.PermissionDeniedError – For HTTP 403 errors indicating insufficient permissions.
venice_ai.exceptions.NotFoundError – For HTTP 404 errors indicating the requested resource was not found.
venice_ai.exceptions.RateLimitError – For HTTP 429 errors indicating rate limit exceeded.
venice_ai.exceptions.InternalServerError – For HTTP 5xx errors indicating server-side problems.
venice_ai.exceptions.APITimeoutError – If the request times out before completion.
venice_ai.exceptions.APIConnectionError – For network connectivity issues or connection failures.
venice_ai.exceptions.APIError – For other API-related errors not covered by specific exceptions.

Provides a synchronous client for interacting with the Venice.ai API.

This client provides a complete interface for making synchronous requests to all Venice AI API endpoints. It handles authentication, request formation, response parsing, and error management through a clean, resource-oriented design.

Parameters:

api_key (str) – Your Venice.ai API key. This is required for authentication.
base_url (Optional[Union[str, httpx.URL]]) – Overrides the default base URL. Defaults to the Venice AI production API URL. Useful for testing against different environments.
timeout (Optional[Union[float, httpx.Timeout]]) – Request timeout in seconds or as a detailed httpx.Timeout object for more granular control. Defaults to 60.0 seconds.
default_timeout (Optional[httpx.Timeout]) – Global default timeout for all API calls made by this client instance. If provided, this will be used as the default timeout for all requests unless overridden on a per-request basis. Takes precedence over the timeout parameter.
max_retries (int) – Maximum number of retries for connection errors or transient failures. This parameter controls the total number of retries for the httpx-retries mechanism. Defaults to 2.
retry_backoff_factor (float) – Backoff factor for retry delays. Defaults to 0.5.
retry_status_forcelist (Optional[List[int]]) – List of HTTP status codes to retry on. Defaults to [429, 500, 502, 503, 504].
retry_respect_retry_after_header (bool) – Whether to respect Retry-After headers. Defaults to True.
http_client (Optional[httpx.Client]) –
An optional pre-configured httpx.Client instance to use for HTTP requests. If provided:
- The SDK will use this custom client directly.
- The SDK will still configure base_url (from the base_url parameter or default), timeout (from default_timeout or timeout parameter), and Authorization headers on this provided client instance.
- All other HTTP-related parameters passed to this constructor (e.g., max_retries, retry_backoff_factor, proxy, transport, limits, verify, etc.) will be ignored. It is assumed that the provided http_client is already configured with these aspects.
- You are responsible for managing the lifecycle of the provided http_client (e.g., closing it).
If not provided, the SDK will create and manage its own internal httpx.Client.
proxy (Optional[Union[str, httpx.URL, httpx.Proxy]]) – Proxy configuration for HTTP requests. Only used when http_client is not provided.
transport (Optional[httpx.BaseTransport]) – Custom transport for HTTP requests. Only used when http_client is not provided.
limits (Optional[httpx.Limits]) – Connection limits configuration. Only used when http_client is not provided.
cert (Optional[Union[str, Tuple[str, str]]]) – Client certificate configuration. Only used when http_client is not provided.
verify (Optional[Union[bool, str, ssl.SSLContext]]) – SSL certificate verification. Only used when http_client is not provided.
trust_env (Optional[bool]) – Whether to trust environment variables for proxy configuration. Only used when http_client is not provided.
http1 (Optional[bool]) – Whether to enable HTTP/1.1. Only used when http_client is not provided.
http2 (Optional[bool]) – Whether to enable HTTP/2. Only used when http_client is not provided.
default_encoding (Optional[Union[str, Callable[[bytes], str]]]) – Default encoding for response content. Only used when http_client is not provided.
event_hooks (Optional[Mapping[str, List[Callable[..., Any]]]]) – Event hooks for request/response lifecycle. Only used when http_client is not provided.

chat

Access to chat-related endpoints.

Type:: ChatResource

models

Access to model listing and information endpoints.

Type:: Models

image

Access to image generation and manipulation endpoints.

Type:: Image

audio

Access to speech synthesis and audio processing endpoints.

Type:: Audio

billing

Access to billing and usage information endpoints.

Type:: Billing

embeddings

Access to embedding generation endpoints.

Type:: Embeddings

api_keys

Access to API key management endpoints.

Type:: ApiKeys

characters

Access to character management endpoints.

Type:: Characters

Examples

Basic usage:

from venice_ai import VeniceClient

client = VeniceClient(api_key="your-api-key")
response = client.chat.completions.create(
    model="venice-1",
    messages=[{"role": "user", "content": "Hello, world!"}]
)
print(response["choices"][0]["message"]["content"])
client.close() # Important to close the client when done

Using as a context manager (recommended):

from venice_ai import VeniceClient

with VeniceClient(api_key="your-api-key") as client:
    response = client.chat.completions.create(
        model="venice-1",
        messages=[{"role": "user", "content": "Hello, world!"}]
    )
    print(response["choices"][0]["message"]["content"])
# Client is automatically closed here

Streaming example:

from venice_ai import VeniceClient

with VeniceClient(api_key="your-api-key") as client:
    for chunk in client.chat.completions.create(
        model="venice-1",
        messages=[{"role": "user", "content": "Count to 5"}],
        stream=True
    ):
        content = chunk["choices"][0]["delta"].get("content", "")
        if content:
            print(content, end="", flush=True)

Raises:: ValueError – If api_key is empty or None.

Note

When used as a context manager (with with), the client will automatically close the underlying HTTP client upon exit, freeing any resources. For manual resource management, always call the close() method when done.

Parameters:

http_transport_options (typing.Optional[typing.Dict[str, typing.Any]])
follow_redirects (typing.Union[bool, venice_ai.utils.NotGivenType, None])
max_redirects (typing.Union[int, venice_ai.utils.NotGivenType, None])

property api_key: str

Get the API key for authentication.

Returns the explicitly set API key, or falls back to the VENICE_API_KEY environment variable if no key was explicitly provided.

Returns:: The API key to use for authentication.
Return type:: str

build_request(method: str, path: str, *, json_data: Mapping[str, Any] | None = None, headers: Mapping[str, str] | None = None, params: Mapping[str, Any] | None = None)[source]

Build a request with proper headers including authentication.

This method constructs the headers for a request, merging authentication headers with any provided headers. It supports default token retention by using the current api_key value.

Parameters:

method (str) – HTTP method for the request.
path (str) – API endpoint path relative to the base URL.
json_data (Optional[Mapping[str, Any]]) – JSON-serializable request body.
headers (Optional[Mapping[str, str]]) – Additional HTTP headers to include.
params (Optional[Mapping[str, Any]]) – URL query parameters.

Returns:

Dictionary containing the built request information.

Return type:

Dict[str, Any]

close()[source]

Close the underlying HTTP client and free resources.

This method should be called when the client is no longer needed to ensure proper cleanup of resources. If using the client as a context manager, this is called automatically on exit.

It is safe to call this method multiple times.

Note

If a user-provided httpx.Client was passed to the constructor, this method will not close it, as the user is responsible for managing the lifecycle of their own client.

Return type:: None

delete(path: str, *, cast_to: Type[venice_ai._client.T] | None = None, **kwargs)[source]

Make a DELETE request to the specified API endpoint.

This is a convenience method for making DELETE requests. It automatically handles header configuration appropriate for DELETE requests.

Parameters:

path (str) – API endpoint path relative to the base URL.
cast_to (Optional[Type[T]]) – Optional Pydantic model to cast the response to.
kwargs – Additional arguments to pass to _request().

Returns:

Parsed JSON response body.

Return type:

Any

Raises:

venice_ai.exceptions.APIError – If the request fails.

get(path: str, *, params: Mapping[str, Any] | None = None, cast_to: Type[venice_ai._client.T] | None = None, **kwargs)[source]

Make a GET request to the specified API endpoint.

This is a convenience method for making GET requests. It automatically handles header configuration appropriate for GET requests.

Parameters:

path (str) – API endpoint path relative to the base URL.
params (Optional[Mapping[str, Any]]) – URL query parameters to include in the request.
cast_to (Optional[Type[T]]) – Optional Pydantic model to cast the response to.
kwargs – Additional arguments to pass to _request().

Returns:

Parsed JSON response body.

Return type:

Any

Raises:

venice_ai.exceptions.APIError – If the request fails.

get_model_pricing(model_id: str)[source]

Get pricing information for a specific model.

Retrieves the pricing structure for a given model ID, including both USD and VCU (Venice Compute Units) costs for input and output tokens.

Parameters:: model_id (str) – The ID of the model to get pricing for
Returns:: Pricing information for the model
Return type:: ModelPricing
Raises:: ValueError – If the model is not found

Example

>>> client = VeniceClient(api_key="your-api-key")
>>> pricing = client.get_model_pricing("llama-3.3-70b")
>>> print(f"Input: ${pricing['input']['usd']}/1k tokens")
>>> print(f"Output: ${pricing['output']['usd']}/1k tokens")

post(path: str, *, json_data: Mapping[str, Any] | None = None, timeout: float | httpx.Timeout | None = None, cast_to: Type[venice_ai._client.T] | None = None, **kwargs)[source]

Make a POST request to the specified API endpoint.

This is a convenience method for making POST requests with JSON data. It automatically sets appropriate headers for JSON content.

Parameters:

path (str) – API endpoint path relative to the base URL.
json_data (Optional[Mapping[str, Any]]) – JSON-serializable request body to send with the request.
timeout (Optional[Union[float, httpx.Timeout]]) – Request timeout in seconds or an httpx.Timeout object. If not provided, uses the client’s default timeout.
cast_to (Optional[Type[T]]) – Optional Pydantic model to cast the response to.
kwargs – Additional arguments to pass to _request().

Returns:

Parsed JSON response body.

Return type:

Any

Raises:

venice_ai.exceptions.APIError – If the request fails.

Bases: BaseClient

Provides a synchronous client for interacting with the Venice.ai API.

Parameters:

api_key (str) – Your Venice.ai API key. This is required for authentication.
base_url (Optional[Union[str, httpx.URL]]) – Overrides the default base URL. Defaults to the Venice AI production API URL. Useful for testing against different environments.
timeout (Optional[Union[float, httpx.Timeout]]) – Request timeout in seconds or as a detailed httpx.Timeout object for more granular control. Defaults to 60.0 seconds.
default_timeout (Optional[httpx.Timeout]) – Global default timeout for all API calls made by this client instance. If provided, this will be used as the default timeout for all requests unless overridden on a per-request basis. Takes precedence over the timeout parameter.
max_retries (int) – Maximum number of retries for connection errors or transient failures. This parameter controls the total number of retries for the httpx-retries mechanism. Defaults to 2.
retry_backoff_factor (float) – Backoff factor for retry delays. Defaults to 0.5.
retry_status_forcelist (Optional[List[int]]) – List of HTTP status codes to retry on. Defaults to [429, 500, 502, 503, 504].
retry_respect_retry_after_header (bool) – Whether to respect Retry-After headers. Defaults to True.
http_client (Optional[httpx.Client]) –
An optional pre-configured httpx.Client instance to use for HTTP requests. If provided:
- The SDK will use this custom client directly.
- The SDK will still configure base_url (from the base_url parameter or default), timeout (from default_timeout or timeout parameter), and Authorization headers on this provided client instance.
- All other HTTP-related parameters passed to this constructor (e.g., max_retries, retry_backoff_factor, proxy, transport, limits, verify, etc.) will be ignored. It is assumed that the provided http_client is already configured with these aspects.
- You are responsible for managing the lifecycle of the provided http_client (e.g., closing it).
If not provided, the SDK will create and manage its own internal httpx.Client.
proxy (Optional[Union[str, httpx.URL, httpx.Proxy]]) – Proxy configuration for HTTP requests. Only used when http_client is not provided.
transport (Optional[httpx.BaseTransport]) – Custom transport for HTTP requests. Only used when http_client is not provided.
limits (Optional[httpx.Limits]) – Connection limits configuration. Only used when http_client is not provided.
cert (Optional[Union[str, Tuple[str, str]]]) – Client certificate configuration. Only used when http_client is not provided.
verify (Optional[Union[bool, str, ssl.SSLContext]]) – SSL certificate verification. Only used when http_client is not provided.
trust_env (Optional[bool]) – Whether to trust environment variables for proxy configuration. Only used when http_client is not provided.
http1 (Optional[bool]) – Whether to enable HTTP/1.1. Only used when http_client is not provided.
http2 (Optional[bool]) – Whether to enable HTTP/2. Only used when http_client is not provided.
default_encoding (Optional[Union[str, Callable[[bytes], str]]]) – Default encoding for response content. Only used when http_client is not provided.
event_hooks (Optional[Mapping[str, List[Callable[..., Any]]]]) – Event hooks for request/response lifecycle. Only used when http_client is not provided.

chat

Access to chat-related endpoints.

Type:: ChatResource

models

Access to model listing and information endpoints.

Type:: Models

image

Access to image generation and manipulation endpoints.

Type:: Image

audio

Access to speech synthesis and audio processing endpoints.

Type:: Audio

billing

Access to billing and usage information endpoints.

Type:: Billing

embeddings

Access to embedding generation endpoints.

Type:: Embeddings

api_keys

Access to API key management endpoints.

Type:: ApiKeys

characters

Access to character management endpoints.

Type:: Characters

Examples

Basic usage:

from venice_ai import VeniceClient

client = VeniceClient(api_key="your-api-key")
response = client.chat.completions.create(
    model="venice-1",
    messages=[{"role": "user", "content": "Hello, world!"}]
)
print(response["choices"][0]["message"]["content"])
client.close() # Important to close the client when done

Using as a context manager (recommended):

from venice_ai import VeniceClient

with VeniceClient(api_key="your-api-key") as client:
    response = client.chat.completions.create(
        model="venice-1",
        messages=[{"role": "user", "content": "Hello, world!"}]
    )
    print(response["choices"][0]["message"]["content"])
# Client is automatically closed here

Streaming example:

from venice_ai import VeniceClient

with VeniceClient(api_key="your-api-key") as client:
    for chunk in client.chat.completions.create(
        model="venice-1",
        messages=[{"role": "user", "content": "Count to 5"}],
        stream=True
    ):
        content = chunk["choices"][0]["delta"].get("content", "")
        if content:
            print(content, end="", flush=True)

Raises:: ValueError – If api_key is empty or None.

Note

Parameters:

http_transport_options (typing.Optional[typing.Dict[str, typing.Any]])
follow_redirects (typing.Union[bool, venice_ai.utils.NotGivenType, None])
max_redirects (typing.Union[int, venice_ai.utils.NotGivenType, None])

Initialize the VeniceClient.

This constructor sets up the client for making API requests. It configures authentication, base URL, timeout settings, and retry mechanisms. It also initializes all the resource namespaces (e.g., chat, models).

Parameters:

api_key (str) – The API key for authentication. Must not be empty or None.
base_url (Optional[Union[str, httpx.URL]]) – Optional base URL to override the default Venice AI API URL. If not provided, uses the default production API URL.
timeout (Optional[Union[float, httpx.Timeout]]) – Request timeout in seconds or as an httpx.Timeout object for more granular control. Defaults to 60.0 seconds.
default_timeout (Optional[httpx.Timeout]) – Global default timeout for all API calls made by this client instance. If provided, this will be used as the default timeout for all requests unless overridden on a per-request basis. Takes precedence over the timeout parameter.
max_retries (int) – Maximum number of retries for connection errors or transient failures. This parameter controls the total number of retries for the httpx-retries mechanism. Defaults to 2.
retry_backoff_factor (float) – Backoff factor for retry delays. Defaults to 0.5.
retry_status_forcelist (Optional[List[int]]) – List of HTTP status codes to retry on. Defaults to [429, 500, 502, 503, 504].
retry_respect_retry_after_header (bool) – Whether to respect Retry-After headers. Defaults to True.
http_client (Optional[httpx.Client]) –
An optional pre-configured httpx.Client instance to use for HTTP requests. If provided:
- The SDK will use this custom client directly.
- The SDK will still configure base_url (from the base_url parameter or default), timeout (from default_timeout or timeout parameter), and Authorization headers on this provided client instance.
- All other HTTP-related parameters passed to this constructor (e.g., max_retries, retry_backoff_factor, proxy, transport, limits, verify, etc.) will be ignored. It is assumed that the provided http_client is already configured with these aspects.
- You are responsible for managing the lifecycle of the provided http_client (e.g., closing it).
If not provided, the SDK will create and manage its own internal httpx.Client.
http_transport_options (typing.Optional[typing.Dict[str, typing.Any]])
proxy (typing.Union[httpx.URL, str, httpx.Proxy, venice_ai.utils.NotGivenType, None])
transport (typing.Union[httpx.BaseTransport, venice_ai.utils.NotGivenType, None])
limits (typing.Union[httpx.Limits, venice_ai.utils.NotGivenType, None])
cert (typing.Union[str, typing.Tuple[str, str], typing.Tuple[str, str, str], venice_ai.utils.NotGivenType, None])
verify (typing.Union[bool, str, ssl.SSLContext, venice_ai.utils.NotGivenType, None])
trust_env (typing.Union[bool, venice_ai.utils.NotGivenType, None])
http1 (typing.Union[bool, venice_ai.utils.NotGivenType, None])
http2 (typing.Union[bool, venice_ai.utils.NotGivenType, None])
follow_redirects (typing.Union[bool, venice_ai.utils.NotGivenType, None])
max_redirects (typing.Union[int, venice_ai.utils.NotGivenType, None])
default_encoding (typing.Union[str, typing.Callable[[bytes], str], venice_ai.utils.NotGivenType, None])
event_hooks (typing.Union[typing.Mapping[str, typing.List[typing.Callable[..., typing.Any]]], venice_ai.utils.NotGivenType, None])

Raises:

ValueError – If api_key is empty or None and VENICE_API_KEY environment variable is not set.

api_keys: venice_ai.resources.api_keys.ApiKeys

audio: venice_ai.resources.audio.Audio

billing: venice_ai.resources.billing.Billing

characters: venice_ai.resources.characters.Characters

chat: venice_ai.resources.chat.ChatResource

close()[source]

Close the underlying HTTP client and free resources.

This method should be called when the client is no longer needed to ensure proper cleanup of resources. If using the client as a context manager, this is called automatically on exit.

It is safe to call this method multiple times.

Note

If a user-provided httpx.Client was passed to the constructor, this method will not close it, as the user is responsible for managing the lifecycle of their own client.

Return type:: None

delete(path: str, *, cast_to: Type[venice_ai._client.T] | None = None, **kwargs)[source]

Make a DELETE request to the specified API endpoint.

This is a convenience method for making DELETE requests. It automatically handles header configuration appropriate for DELETE requests.

Parameters:

path (str) – API endpoint path relative to the base URL.
cast_to (Optional[Type[T]]) – Optional Pydantic model to cast the response to.
kwargs – Additional arguments to pass to _request().

Returns:

Parsed JSON response body.

Return type:

Any

Raises:

venice_ai.exceptions.APIError – If the request fails.

embeddings: venice_ai.resources.embeddings.Embeddings

get(path: str, *, params: Mapping[str, Any] | None = None, cast_to: Type[venice_ai._client.T] | None = None, **kwargs)[source]

Make a GET request to the specified API endpoint.

This is a convenience method for making GET requests. It automatically handles header configuration appropriate for GET requests.

Parameters:

path (str) – API endpoint path relative to the base URL.
params (Optional[Mapping[str, Any]]) – URL query parameters to include in the request.
cast_to (Optional[Type[T]]) – Optional Pydantic model to cast the response to.
kwargs – Additional arguments to pass to _request().

Returns:

Parsed JSON response body.

Return type:

Any

Raises:

venice_ai.exceptions.APIError – If the request fails.

image: venice_ai.resources.image.Image

models: venice_ai.resources.models.Models

post(path: str, *, json_data: Mapping[str, Any] | None = None, timeout: float | httpx.Timeout | None = None, cast_to: Type[venice_ai._client.T] | None = None, **kwargs)[source]

Make a POST request to the specified API endpoint.

This is a convenience method for making POST requests with JSON data. It automatically sets appropriate headers for JSON content.

Parameters:

path (str) – API endpoint path relative to the base URL.
json_data (Optional[Mapping[str, Any]]) – JSON-serializable request body to send with the request.
timeout (Optional[Union[float, httpx.Timeout]]) – Request timeout in seconds or an httpx.Timeout object. If not provided, uses the client’s default timeout.
cast_to (Optional[Type[T]]) – Optional Pydantic model to cast the response to.
kwargs – Additional arguments to pass to _request().

Returns:

Parsed JSON response body.

Return type:

Any

Raises:

venice_ai.exceptions.APIError – If the request fails.

Bases: BaseClient

Provides an asynchronous client for interacting with the Venice.ai API.

Parameters:

api_key (str) – Your Venice.ai API key. This is required for authentication.
base_url (Optional[Union[str, httpx.URL]]) – Overrides the default base URL. Defaults to the Venice AI production API URL. Useful for testing against different environments.
timeout (Optional[Union[float, httpx.Timeout]]) – Request timeout in seconds or as a detailed httpx.Timeout object for more granular control. Defaults to 60.0 seconds.
default_timeout (Optional[httpx.Timeout]) – Global default timeout for all API calls made by this client instance. If provided, this will be used as the default timeout for all requests unless overridden on a per-request basis. Takes precedence over the timeout parameter.
max_retries (int) – Maximum number of retries for connection errors or transient failures. This parameter controls the total number of retries for the httpx-retries mechanism. Defaults to 2.
retry_backoff_factor (float) – Backoff factor for retry delays. Defaults to 0.5.
retry_status_forcelist (Optional[List[int]]) – List of HTTP status codes to retry on. Defaults to [429, 500, 502, 503, 504].
retry_respect_retry_after_header (bool) – Whether to respect Retry-After headers. Defaults to True.
http_client (Optional[httpx.AsyncClient]) –
An optional pre-configured httpx.AsyncClient instance to use for HTTP requests. If provided:
- The SDK will use this custom client directly.
- The SDK will still configure base_url (from the base_url parameter or default), timeout (from default_timeout or timeout parameter), and Authorization headers on this provided client instance.
- All other HTTP-related parameters passed to this constructor (e.g., max_retries, retry_backoff_factor, proxy, transport, limits, verify, etc.) will be ignored. It is assumed that the provided http_client is already configured with these aspects.
- You are responsible for managing the lifecycle of the provided http_client (e.g., closing it via await http_client.aclose()).
If not provided, the SDK will create and manage its own internal httpx.AsyncClient.
proxy (Optional[Union[str, httpx.URL, httpx.Proxy]]) – Proxy configuration for HTTP requests. Only used when http_client is not provided.
transport (Optional[httpx.BaseTransport]) – Custom transport for HTTP requests. Only used when http_client is not provided.
limits (Optional[httpx.Limits]) – Connection limits configuration. Only used when http_client is not provided.
cert (Optional[Union[str, Tuple[str, str]]]) – Client certificate configuration. Only used when http_client is not provided.
verify (Optional[Union[bool, str, ssl.SSLContext]]) – SSL certificate verification. Only used when http_client is not provided.
trust_env (Optional[bool]) – Whether to trust environment variables for proxy configuration. Only used when http_client is not provided.
http1 (Optional[bool]) – Whether to enable HTTP/1.1. Only used when http_client is not provided.
http2 (Optional[bool]) – Whether to enable HTTP/2. Only used when http_client is not provided.
default_encoding (Optional[Union[str, Callable[[bytes], str]]]) – Default encoding for response content. Only used when http_client is not provided.
event_hooks (Optional[Mapping[str, List[Callable[..., Any]]]]) – Event hooks for request/response lifecycle. Only used when http_client is not provided.

chat

Access to chat-related endpoints.

Type:: AsyncChatResource

models

Access to model listing and information endpoints.

Type:: AsyncModels

image

Access to image generation and manipulation endpoints.

Type:: AsyncImage

audio

Access to speech synthesis and audio processing endpoints.

Type:: AsyncAudio

billing

Access to billing and usage information endpoints.

Type:: AsyncBilling

embeddings

Access to embedding generation endpoints.

Type:: AsyncEmbeddings

api_keys

Access to API key management endpoints.

Type:: AsyncApiKeys

characters

Access to character management endpoints.

Type:: AsyncCharacters

Examples

Basic usage:

from venice_ai import AsyncVeniceClient

async with AsyncVeniceClient(api_key="your-api-key") as client:
    response = await client.chat.completions.create(
        model="venice-1",
        messages=[{"role": "user", "content": "Hello, world!"}]
    )
    print(response["choices"][0]["message"]["content"])

Streaming example:

from venice_ai import AsyncVeniceClient

async with AsyncVeniceClient(api_key="your-api-key") as client:
    async for chunk in client.chat.completions.create(
        model="venice-1",
        messages=[{"role": "user", "content": "Count to 5"}],
        stream=True
    ):
        content = chunk["choices"][0]["delta"].get("content", "")
        if content:
            print(content, end="", flush=True)

Using with a custom httpx client:

import httpx
from venice_ai import AsyncVeniceClient

# Create a custom client with specific configurations
custom_client = httpx.AsyncClient(
    timeout=httpx.Timeout(connect=5.0, read=30.0, write=10.0),
    follow_redirects=True,
    http2=True
)

# Use the custom client with AsyncVeniceClient
async with AsyncVeniceClient(
    api_key="your-api-key",
    http_client=custom_client
) as client:
    # Your API operations here
    pass

Raises:: ValueError – If api_key is empty or None.

Note

Parameters:

http_transport_options (typing.Optional[typing.Dict[str, typing.Any]])
async_transport (typing.Union[httpx.AsyncBaseTransport, venice_ai.utils.NotGivenType, None])
follow_redirects (typing.Union[bool, venice_ai.utils.NotGivenType, None])
max_redirects (typing.Union[int, venice_ai.utils.NotGivenType, None])

Initialize the AsyncVeniceClient for asynchronous API interactions.

This constructor sets up the client for making asynchronous API requests to the Venice AI API. It configures authentication, base URL, timeout settings, retry mechanisms, and initializes all the asynchronous resource namespaces (e.g., chat, models, image, audio).

The client can be configured with custom HTTP settings through the http_client parameter, or it will create its own httpx.AsyncClient with appropriate defaults. When providing a custom client, essential headers like Authorization will be automatically set or updated.

Parameters:

api_key (str) – Your Venice.ai API key for authentication. This is required and cannot be empty. The key will be automatically stripped of whitespace to prevent authentication issues.
base_url (Optional[Union[str, httpx.URL]]) – Base URL for the Venice AI API. If not provided, defaults to the production Venice AI API URL. Can be a string or httpx.URL object. Useful for testing against different environments or API versions.
timeout (Optional[Union[float, httpx.Timeout]]) – Request timeout configuration. Can be a float (seconds) for simple timeout, or an httpx.Timeout object for granular control over connect, read, write, and pool timeouts. Defaults to 60.0 seconds if not specified.
default_timeout (Optional[httpx.Timeout]) – Global default timeout for all API calls made by this client instance. If provided, this will be used as the default timeout for all requests unless overridden on a per-request basis. Takes precedence over the timeout parameter.
max_retries (int) – Maximum number of automatic retries for failed requests due to connection errors or transient failures. This parameter controls the total number of retries for the httpx-retries mechanism. Defaults to 2.
retry_backoff_factor (float) – Backoff factor for retry delays. Defaults to 0.5.
retry_status_forcelist (Optional[List[int]]) – List of HTTP status codes to retry on. Defaults to [429, 500, 502, 503, 504].
retry_respect_retry_after_header (bool) – Whether to respect Retry-After headers. Defaults to True.
http_client (Optional[httpx.AsyncClient]) –
An optional pre-configured httpx.AsyncClient instance to use for HTTP requests. If provided:
- The SDK will use this custom client directly.
- The SDK will still configure base_url (from the base_url parameter or default), timeout (from default_timeout or timeout parameter), and Authorization headers on this provided client instance.
- All other HTTP-related parameters passed to this constructor (e.g., max_retries, retry_backoff_factor, proxy, transport, limits, verify, etc.) will be ignored. It is assumed that the provided http_client is already configured with these aspects.
- You are responsible for managing the lifecycle of the provided http_client (e.g., closing it via await http_client.aclose()).
If not provided, the SDK will create and manage its own internal httpx.AsyncClient.

Raises:

ValueError – If api_key is empty, None, or consists only of whitespace.

Note

When using a custom http_client, ensure it’s configured appropriately for your use case. The Venice AI client will modify headers but will not change other client settings like timeouts, proxies, or SSL configuration.

Parameters:

http_transport_options (typing.Optional[typing.Dict[str, typing.Any]])
proxy (typing.Union[httpx.URL, str, httpx.Proxy, venice_ai.utils.NotGivenType, None])
transport (typing.Union[httpx.BaseTransport, venice_ai.utils.NotGivenType, None])
async_transport (typing.Union[httpx.AsyncBaseTransport, venice_ai.utils.NotGivenType, None])
limits (typing.Union[httpx.Limits, venice_ai.utils.NotGivenType, None])
cert (typing.Union[str, typing.Tuple[str, str], typing.Tuple[str, str, str], venice_ai.utils.NotGivenType, None])
verify (typing.Union[bool, str, ssl.SSLContext, venice_ai.utils.NotGivenType, None])
trust_env (typing.Union[bool, venice_ai.utils.NotGivenType, None])
http1 (typing.Union[bool, venice_ai.utils.NotGivenType, None])
http2 (typing.Union[bool, venice_ai.utils.NotGivenType, None])
follow_redirects (typing.Union[bool, venice_ai.utils.NotGivenType, None])
max_redirects (typing.Union[int, venice_ai.utils.NotGivenType, None])
default_encoding (typing.Union[str, typing.Callable[[bytes], str], venice_ai.utils.NotGivenType, None])
event_hooks (typing.Union[typing.Mapping[str, typing.List[typing.Callable[..., typing.Any]]], venice_ai.utils.NotGivenType, None])

api_keys: venice_ai.resources.api_keys.AsyncApiKeys

audio: venice_ai.resources.audio.AsyncAudio

billing: venice_ai.resources.billing.AsyncBilling

characters: venice_ai.resources.characters.AsyncCharacters

chat: venice_ai.resources.chat.AsyncChatResource

async close()[source]

Close the underlying asynchronous HTTP client and free all associated resources.

Note

If a user-provided httpx.AsyncClient was passed to the constructor, this method will not close it, as the user is responsible for managing the lifecycle of their own client.

After calling this method, the client should not be used for making further API requests. Attempting to use a closed client may result in errors or undefined behavior.

Return type:: None

async delete(path: str, *, cast_to: Type[venice_ai._async_client.T] | None = None, **kwargs)[source]

Make an asynchronous DELETE request to the specified API endpoint.

Parameters:

path (str) – API endpoint path relative to the client’s base URL. Should not include a leading slash as it will be properly joined with the base URL.
cast_to (Optional[Type[T]]) – Optional Pydantic model to cast the response to.
kwargs – Additional keyword arguments to pass to the underlying _request method. This can include options like headers, params, timeout, or raw_response.

Returns:

Parsed JSON response data as Python objects (typically dict or list). Many DELETE endpoints return confirmation data or the deleted resource details.

Return type:

Any

Raises:

venice_ai.exceptions.InvalidRequestError – For HTTP 400 errors indicating invalid request parameters.
venice_ai.exceptions.AuthenticationError – For HTTP 401 errors indicating invalid or missing API key.
venice_ai.exceptions.PermissionDeniedError – For HTTP 403 errors indicating insufficient permissions.
venice_ai.exceptions.NotFoundError – For HTTP 404 errors indicating the requested resource was not found.
venice_ai.exceptions.RateLimitError – For HTTP 429 errors indicating rate limit exceeded.
venice_ai.exceptions.InternalServerError – For HTTP 5xx errors indicating server-side problems.
venice_ai.exceptions.APITimeoutError – If the request times out before completion.
venice_ai.exceptions.APIConnectionError – For network connectivity issues or connection failures.
venice_ai.exceptions.APIError – For other API-related errors not covered by specific exceptions.

embeddings: venice_ai.resources.embeddings.AsyncEmbeddings

async get(path: str, *, params: Mapping[str, Any] | None = None, cast_to: Type[venice_ai._async_client.T] | None = None, **kwargs)[source]

Make an asynchronous GET request to the specified API endpoint.

Parameters:

path (str) – API endpoint path relative to the client’s base URL. Should not include a leading slash as it will be properly joined with the base URL.
params (Optional[Mapping[str, Any]]) – URL query parameters to include in the request. These will be properly URL-encoded and appended to the request URL.
cast_to (Optional[Type[T]]) – Optional Pydantic model to cast the response to.
kwargs – Additional keyword arguments to pass to the underlying _request method. This can include options like headers, timeout, or raw_response.

Returns:

Parsed JSON response data as Python objects (typically dict or list).

Return type:

Any

Raises:

venice_ai.exceptions.InvalidRequestError – For HTTP 400 errors indicating invalid request parameters.
venice_ai.exceptions.AuthenticationError – For HTTP 401 errors indicating invalid or missing API key.
venice_ai.exceptions.PermissionDeniedError – For HTTP 403 errors indicating insufficient permissions.
venice_ai.exceptions.NotFoundError – For HTTP 404 errors indicating the requested resource was not found.
venice_ai.exceptions.RateLimitError – For HTTP 429 errors indicating rate limit exceeded.
venice_ai.exceptions.InternalServerError – For HTTP 5xx errors indicating server-side problems.
venice_ai.exceptions.APITimeoutError – If the request times out before completion.
venice_ai.exceptions.APIConnectionError – For network connectivity issues or connection failures.
venice_ai.exceptions.APIError – For other API-related errors not covered by specific exceptions.

image: venice_ai.resources.image.AsyncImage

models: venice_ai.resources.models.AsyncModels

async post(path: str, *, json_data: Mapping[str, Any] | None = None, timeout: float | httpx.Timeout | None = None, cast_to: Type[venice_ai._async_client.T] | None = None, **kwargs)[source]

Make an asynchronous POST request to the specified API endpoint.

Parameters:

path (str) – API endpoint path relative to the client’s base URL. Should not include a leading slash as it will be properly joined with the base URL.
json_data (Optional[Mapping[str, Any]]) – JSON-serializable data to send in the request body. This will be automatically serialized to JSON and sent with Content-Type: application/json headers. Can include any data structure that is JSON-serializable (dict, list, primitives).
timeout (Optional[Union[float, httpx.Timeout]]) – Request timeout configuration. Can be a float specifying timeout in seconds, or an httpx.Timeout object for granular timeout control. If not provided, uses the client’s default timeout setting.
cast_to (Optional[Type[T]]) – Optional Pydantic model to cast the response to.
kwargs – Additional keyword arguments to pass to the underlying _request method. This can include options like headers, params, or raw_response.

Returns:

Parsed JSON response data as Python objects (typically dict or list).

Return type:

Any

Raises:

venice_ai.exceptions.InvalidRequestError – For HTTP 400 errors indicating invalid request parameters.
venice_ai.exceptions.AuthenticationError – For HTTP 401 errors indicating invalid or missing API key.
venice_ai.exceptions.PermissionDeniedError – For HTTP 403 errors indicating insufficient permissions.
venice_ai.exceptions.NotFoundError – For HTTP 404 errors indicating the requested resource was not found.
venice_ai.exceptions.RateLimitError – For HTTP 429 errors indicating rate limit exceeded.
venice_ai.exceptions.InternalServerError – For HTTP 5xx errors indicating server-side problems.
venice_ai.exceptions.APITimeoutError – If the request times out before completion.
venice_ai.exceptions.APIConnectionError – For network connectivity issues or connection failures.
venice_ai.exceptions.APIError – For other API-related errors not covered by specific exceptions.

Advanced HTTP Client Configuration¶

The VeniceClient and AsyncVeniceClient offer flexible ways to configure the underlying HTTP client (httpx.Client and httpx.AsyncClient respectively). This allows for advanced scenarios such as custom mTLS, specific proxy setups, detailed transport logging, or fine-tuning HTTP/2 behavior.

There are two primary methods for this:

Passing a Pre-configured httpx.Client / httpx.AsyncClient
Passing Common httpx Settings Directly to the SDK Client Constructor

Passing a Pre-configured `httpx.Client` / `httpx.AsyncClient`¶

You can instantiate the SDK client with an http_client parameter, providing your own fully configured httpx.Client or httpx.AsyncClient instance. The SDK will use this instance directly for making API calls.

Key Behaviors:

SDK Management: The SDK will still manage the base_url and authentication headers for the requests. It will also apply its default timeout if a more specific timeout (e.g., per-request timeout) is not already configured on your provided client.
Lifecycle Management: You are responsible for the lifecycle (e.g., closing) of the httpx.Client or httpx.AsyncClient instance you provide. The SDK will not close a user-provided client, even when the SDK client is closed or used as a context manager.

Use Cases:

Implementing custom mutual TLS (mTLS) authentication.
Using a very specific proxy configuration not easily achieved with simple string/dict proxies.
Integrating advanced logging or monitoring at the HTTP transport layer.
Reusing an existing httpx.Client instance that is shared across different parts of your application.

Example: ``VeniceClient`` with a custom ``httpx.Client``

import httpx
from venice_ai import VeniceClient

# User creates and configures their own httpx.Client
custom_transport = httpx.HTTPTransport(retries=5)
my_httpx_client = httpx.Client(
    transport=custom_transport,
    proxies={"all://": "http://localhost:8080"}
)

# Pass it to VeniceClient
# The user is responsible for closing my_httpx_client when done.
client = VeniceClient(api_key="YOUR_API_KEY", http_client=my_httpx_client)

# Use the client as usual
try:
    models = client.models.list()
    print(models)
finally:
    # User must close their client.
    # VeniceClient's close() or context manager __exit__ will NOT close my_httpx_client.
    if not my_httpx_client.is_closed:
        my_httpx_client.close()

Example: ``AsyncVeniceClient`` with a custom ``httpx.AsyncClient``

import httpx
import asyncio
from venice_ai import AsyncVeniceClient

async def main():
    # User creates and configures their own httpx.AsyncClient
    custom_transport = httpx.AsyncHTTPTransport(retries=5)
    my_async_httpx_client = httpx.AsyncClient(
        transport=custom_transport,
        proxies={"all://": "http://localhost:8080"}
    )

    # Pass it to AsyncVeniceClient
    # The user is responsible for closing my_async_httpx_client when done.
    async_client = AsyncVeniceClient(api_key="YOUR_API_KEY", http_client=my_async_httpx_client)

    # Use the client as usual
    try:
        models = await async_client.models.list()
        print(models)
    finally:
        # User must close their client.
        # AsyncVeniceClient's aclose() or context manager __aexit__ will NOT close my_async_httpx_client.
        if not my_async_httpx_client.is_closed:
            await my_async_httpx_client.aclose()

if __name__ == "__main__":
    asyncio.run(main())

Passing `httpx` Settings Directly¶

If you do not provide your own http_client instance, you can pass common httpx.Client or httpx.AsyncClient constructor arguments directly to the VeniceClient or AsyncVeniceClient constructor. The SDK will use these arguments to create and manage its internal httpx client.

Key Behaviors:

SDK Management: The SDK creates, configures, and manages the lifecycle of the internal httpx.Client or httpx.AsyncClient. It will be closed when the SDK client’s close() (or aclose()) method is called, or when the SDK client exits its context manager.
Supported Parameters: You can pass the following httpx constructor arguments: * proxy: A proxy URL or a dictionary mapping URL schemes to proxy URLs. * proxies: (Alternative to proxy) A dictionary mapping URL schemes or specific domain/host patterns to proxy URLs. * transport: An httpx.HTTPTransport or httpx.AsyncHTTPTransport instance for advanced transport layer customization (e.g., connection pooling, retries, UNIX domain sockets). * limits: An httpx.Limits instance to configure connection limits (e.g., max_connections, max_keepalive_connections). * cert: An SSL certificate, either a path to a PEM file or a 2-tuple of (cert, key) file paths. * verify: SSL verification. Can be a boolean (True/False) or a path to a CA bundle file. Defaults to True. Set to False with caution. * trust_env: A boolean indicating whether to trust environment variables for proxy configuration, SSL certificates, etc. Defaults to True. * http1: A boolean indicating whether to allow HTTP/1.1 requests. Defaults to True. * http2: A boolean indicating whether to enable HTTP/2 support. Defaults to False (httpx default). * follow_redirects: A boolean indicating whether to automatically follow redirects. Defaults to False for the SDK client. * max_redirects: The maximum number of redirects to follow if follow_redirects is True. * default_encoding: A callable or string to determine the default encoding for response text. * event_hooks: A dictionary of event hooks (e.g., for request, response).

Use Cases:

Easily configuring a standard HTTP/S proxy.
Setting up custom SSL/TLS verification (e.g., using a corporate CA bundle).
Adjusting connection pool limits.
Enabling HTTP/2.
Customizing retry behavior via a custom transport.

Example: ``VeniceClient`` with direct ``httpx`` settings

from venice_ai import VeniceClient
import httpx # For httpx.Limits and httpx.HTTPTransport

# Pass httpx settings directly to VeniceClient constructor
# The SDK will create and manage its internal httpx.Client with these settings
client = VeniceClient(
    api_key="YOUR_API_KEY",
    proxies={"all://": "http://localhost:8080"}, # Example proxy
    transport=httpx.HTTPTransport(retries=3),    # Example custom transport
    limits=httpx.Limits(max_connections=100, max_keepalive_connections=20), # Example limits
    verify=False                                 # Example: disable SSL verification (use with caution)
)

# Use the client as usual (SDK manages httpx.Client lifecycle)
with client: # Or client.close() when done
    models = client.models.list()
    print(models)

Example: ``AsyncVeniceClient`` with direct ``httpx`` settings

from venice_ai import AsyncVeniceClient
import httpx # For httpx.Limits and httpx.AsyncHTTPTransport
import asyncio

async def main():
    # Pass httpx settings directly to AsyncVeniceClient constructor
    # The SDK will create and manage its internal httpx.AsyncClient with these settings
    async_client = AsyncVeniceClient(
        api_key="YOUR_API_KEY",
        proxies={"all://": "http://localhost:8080"}, # Example proxy
        transport=httpx.AsyncHTTPTransport(retries=3), # Example custom transport
        limits=httpx.Limits(max_connections=100, max_keepalive_connections=20), # Example limits
        verify=False                                 # Example: disable SSL verification (use with caution)
    )

    # Use the client as usual (SDK manages httpx.AsyncClient lifecycle)
    async with async_client: # Or await async_client.aclose() when done
        models = await async_client.models.list()
        print(models)

if __name__ == "__main__":
    asyncio.run(main())

Chat Resources¶

class venice_ai.resources.chat.AsyncChatResource(client: AsyncVeniceClient)[source]

Provides asynchronous access to chat-related API operations.

This class acts as a namespace for asynchronous chat functionalities and is accessed via async_client.chat. It serves as a container for chat-related operations, primarily providing access to asynchronous chat completion functionality through the completions property.

Parameters:: client (venice_ai._async_client.AsyncVeniceClient) – The asynchronous AsyncVeniceClient instance.

completions: venice_ai.resources.chat.completions.AsyncChatCompletions: Access to asynchronous chat completion creation operations.

class venice_ai.resources.chat.ChatResource(client: venice_ai._client.VeniceClient)[source]

Provides access to chat-related API operations.

This class acts as a namespace for chat functionalities and is accessed via client.chat. It serves as a container for chat-related operations, primarily providing access to chat completion functionality through the completions property.

Parameters:: client (venice_ai._client.VeniceClient) – The synchronous VeniceClient instance.

completions: venice_ai.resources.chat.completions.ChatCompletions: Access to chat completion creation operations.

class venice_ai.resources.chat.completions.AsyncChatCompletions(client: venice_ai._resource.AsyncClientT)[source]

Provides access to asynchronous chat completion operations.

This class manages asynchronous chat completion operations with Venice AI models, supporting both standard (non-streaming) and streaming response formats. It serves as the primary interface for chat-based interactions with Venice AI language models in asynchronous contexts.

The class handles parameter validation, request formation, and response parsing for asynchronous chat completion requests.

Parameters:: _client (venice_ai._async_client.AsyncVeniceClient) – The client instance used to make API requests.

Example

from venice_ai import AsyncVeniceClient
import asyncio

async def main():
    # Initialize the async client
    client = AsyncVeniceClient(api_key="your-api-key")

    # Create a chat completion asynchronously
    response = await client.chat.completions.create(
        model="venice-1",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Tell me about Venice AI."}
        ]
    )

    # Access the response content
    print(response["choices"][0]["message"]["content"])

# Run the async function
asyncio.run(main())

Parameters:: client (typing.TypeVar(AsyncClientT, bound= AsyncVeniceClient))

async create(*, model: str, messages: Sequence[venice_ai.types.chat.MessageParam], stream: bool = False, stream_cls: Type[venice_ai.types.chat.ChunkModelFactory[venice_ai.types.chat.ChatCompletionChunk]] | None = None, **kwargs: Any)[source]

Create a model response for the given chat conversation asynchronously.

This method handles the core functionality of the chat completions API, allowing for both synchronous and streaming responses in async contexts. It sends the provided messages and parameters to the Venice AI API and returns either a complete response or a stream of partial responses.

The method automatically formats the request body, applies appropriate defaults, and routes the request to either the standard or streaming endpoint based on the stream parameter.

Parameters:

model (str) – ID of the model to use (e.g., "venice-1", "llama-3.3-70b").
messages (Sequence[venice_ai.types.chat.MessageParam]) – Sequence of messages forming the conversation.
stream (bool) – If True, stream back partial progress. Defaults to False. Returns an AsyncIterator[ChatCompletionChunk] if True, otherwise ChatCompletion.
stream_cls (Optional[Type[venice_ai.types.chat.ChunkModelFactory[venice_ai.types.chat.ChatCompletionChunk]]]) – Optional stream wrapper class for streaming responses. Must conform to the ChunkModelFactory protocol.
frequency_penalty (Optional[float]) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far.
max_tokens (Optional[int]) – Deprecated. Please use max_completion_tokens instead. The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model’s context length.
max_completion_tokens (Optional[int]) – Maximum number of tokens that can be generated in the chat completion.
n (Optional[int]) – Number of chat completion choices to generate for each input message.
presence_penalty (Optional[float]) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far.
response_format (Optional[venice_ai.types.chat.ResponseFormat]) – Specifies the format that the model must output (e.g., for JSON mode).
seed (Optional[int]) – Random seed for reproducible outputs.
stop (Optional[Union[str, Sequence[str]]]) – Up to 4 sequences where the API will stop generating further tokens.
temperature (Optional[float]) – Sampling temperature between 0.0 and 2.0. Higher values make output more random, lower values more focused and deterministic. Defaults to 0.7.
top_p (Optional[float]) – Nucleus sampling parameter between 0.0 and 1.0. Defaults to 1.0.
tools (Optional[Sequence[venice_ai.types.chat.Tool]]) – List of tools the model may call.
tool_choice (Optional[Union[Literal["none", "auto"], venice_ai.types.chat.ToolChoiceObject]]) – Controls which (if any) tool is called by the model. Can be "none", "auto", or a specific tool.
user (Optional[str]) – Unique identifier representing your end-user (discarded by API but supported for OpenAI compatibility).
venice_parameters (Optional[venice_ai.types.chat.VeniceParameters]) – Venice-specific parameters for fine-tuning model behavior.
logprobs (Optional[bool]) – Whether to return log probabilities of the output tokens.
top_logprobs (Optional[int]) – Number of most likely tokens to return at each token position if logprobs is True.
parallel_tool_calls (Optional[bool]) – Whether to enable parallel function calling during tool use.
repetition_penalty (Optional[float]) – Penalty for token repetition.
stop_token_ids (Optional[Sequence[int]]) – List of token IDs at which to stop generation.
top_k (Optional[int]) – Number of highest probability vocabulary tokens to keep for top-k-filtering.
stream_options (Optional[venice_ai.types.chat.StreamOptions]) – Additional options for controlling streaming behavior.
logit_bias (Optional[Dict[str, int]]) – Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100.
kwargs (typing.Any) – Additional keyword arguments.

Returns:

A ChatCompletion if stream is False, otherwise an AsyncIterator of ChatCompletionChunk.

Return type:

Union[venice_ai.types.chat.ChatCompletion, AsyncIterator[venice_ai.types.chat.ChatCompletionChunk]]

Raises:

venice_ai.exceptions.InvalidRequestError – If parameters are invalid or malformed.
venice_ai.exceptions.AuthenticationError – If the API key is invalid or missing.
venice_ai.exceptions.PermissionDeniedError – If access is denied to the requested model or feature.
venice_ai.exceptions.NotFoundError – If the model or resource is not found.
venice_ai.exceptions.RateLimitError – If rate limits are exceeded for the account.
venice_ai.exceptions.APIError – For other API-related errors not covered by specific exceptions.

Example

# Non-streaming async usage
import asyncio
from venice_ai import AsyncVeniceClient

async def main():
    client = AsyncVeniceClient(api_key="your-api-key")
    response = await client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Explain async programming in Python."}
        ],
        temperature=0.3
    )
    print(response["choices"][0]["message"]["content"])

asyncio.run(main())

# Async streaming usage
async def stream_example():
    client = AsyncVeniceClient(api_key="your-api-key")
    async for chunk in await client.chat.completions.create(
        model="venice-1",
        messages=[{"role": "user", "content": "Tell me a story."}],
        stream=True,
        max_completion_tokens=200
    ):
        content = chunk["choices"][0]["delta"].get("content", "")
        if content:
            print(content, end="", flush=True)

asyncio.run(stream_example())

class venice_ai.resources.chat.completions.ChatCompletions(client: venice_ai._resource.SyncClientT)[source]

Provides access to chat completion operations.

This class manages synchronous chat completion operations with Venice AI models, supporting both standard (non-streaming) and streaming response formats. It serves as the primary interface for chat-based interactions with Venice AI language models.

The class handles parameter validation, request formation, and response parsing for chat completion requests.

Parameters:: _client (venice_ai._client.VeniceClient) – The client instance used to make API requests.

Example

from venice_ai import VeniceClient

# Initialize the client
client = VeniceClient(api_key="your-api-key")

# Create a chat completion
response = client.chat.completions.create(
    model="venice-1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me about Venice AI."}
    ]
)

# Access the response content
print(response["choices"][0]["message"]["content"])

Parameters:: client (typing.TypeVar(SyncClientT, bound= VeniceClient))

create(*, model: str, messages: Sequence[venice_ai.types.chat.MessageParam], stream: bool = False, stream_cls: Type[venice_ai.types.chat.ChunkModelFactory[venice_ai.types.chat.ChatCompletionChunk]] | None = None, **kwargs: Any)[source]

Create a model response for the given chat conversation.

This method handles the core functionality of the chat completions API, allowing for both synchronous and streaming responses. It sends the provided messages and parameters to the Venice AI API and returns either a complete response or a stream of partial responses.

The method automatically formats the request body, applies appropriate defaults, and routes the request to either the standard or streaming endpoint based on the stream parameter.

Parameters:

model (str) – ID of the model to use (e.g., "venice-1", "llama-3.3-70b").
messages (Sequence[venice_ai.types.chat.MessageParam]) – Sequence of messages forming the conversation.
stream (bool) – If True, stream back partial progress. Defaults to False. Returns an Iterator[ChatCompletionChunk] if True, otherwise ChatCompletion.
stream_cls (Optional[Type[venice_ai.types.chat.ChunkModelFactory[venice_ai.types.chat.ChatCompletionChunk]]]) – Optional stream wrapper class for streaming responses. Must conform to the ChunkModelFactory protocol.
frequency_penalty (Optional[float]) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far.
max_tokens (Optional[int]) – Deprecated. Please use max_completion_tokens instead. The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model’s context length.
max_completion_tokens (Optional[int]) – Maximum number of tokens that can be generated in the chat completion.
n (Optional[int]) – Number of chat completion choices to generate for each input message.
presence_penalty (Optional[float]) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far.
response_format (Optional[venice_ai.types.chat.ResponseFormat]) – Specifies the format that the model must output (e.g., for JSON mode).
seed (Optional[int]) – Random seed for reproducible outputs.
stop (Optional[Union[str, Sequence[str]]]) – Up to 4 sequences where the API will stop generating further tokens.
temperature (Optional[float]) – Sampling temperature between 0.0 and 2.0. Higher values make output more random, lower values more focused and deterministic. Defaults to 0.7.
top_p (Optional[float]) – Nucleus sampling parameter between 0.0 and 1.0. Defaults to 1.0.
tools (Optional[Sequence[venice_ai.types.chat.Tool]]) – List of tools the model may call.
tool_choice (Optional[Union[Literal["none", "auto"], venice_ai.types.chat.ToolChoiceObject]]) – Controls which (if any) tool is called by the model. Can be "none", "auto", or a specific tool.
user (Optional[str]) – Unique identifier representing your end-user (discarded by API but supported for OpenAI compatibility).
venice_parameters (Optional[venice_ai.types.chat.VeniceParameters]) – Venice-specific parameters for fine-tuning model behavior.
logprobs (Optional[bool]) – Whether to return log probabilities of the output tokens.
top_logprobs (Optional[int]) – Number of most likely tokens to return at each token position if logprobs is True.
parallel_tool_calls (Optional[bool]) – Whether to enable parallel function calling during tool use.
repetition_penalty (Optional[float]) – Penalty for token repetition.
stop_token_ids (Optional[Sequence[int]]) – List of token IDs at which to stop generation.
top_k (Optional[int]) – Number of highest probability vocabulary tokens to keep for top-k-filtering.
stream_options (Optional[venice_ai.types.chat.StreamOptions]) – Additional options for controlling streaming behavior.
logit_bias (Optional[Dict[str, int]]) – Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100.
kwargs (typing.Any) – Additional keyword arguments.

Returns:

A ChatCompletion if stream is False, otherwise an Iterator of ChatCompletionChunk.

Return type:

Union[venice_ai.types.chat.ChatCompletion, Iterator[venice_ai.types.chat.ChatCompletionChunk]]

Raises:

venice_ai.exceptions.InvalidRequestError – If parameters are invalid or malformed.
venice_ai.exceptions.AuthenticationError – If the API key is invalid or missing.
venice_ai.exceptions.PermissionDeniedError – If access is denied to the requested model or feature.
venice_ai.exceptions.NotFoundError – If the model or resource is not found.
venice_ai.exceptions.RateLimitError – If rate limits are exceeded for the account.
venice_ai.exceptions.APIError – For other API-related errors not covered by specific exceptions.

Example

# Non-streaming usage with system and user messages
from venice_ai import VeniceClient
client = VeniceClient(api_key="your-api-key")
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant specializing in Python."},
        {"role": "user", "content": "Write a function to calculate the Fibonacci sequence."}
    ],
    temperature=0.3  # More deterministic/focused response
)
print(response["choices"][0]["message"]["content"])

# Streaming usage with progress display
for chunk in client.chat.completions.create(
    model="venice-1",
    messages=[{"role": "user", "content": "Explain quantum computing briefly."}],
    stream=True,
    max_completion_tokens=250  # Limit response length
):
    content = chunk["choices"][0]["delta"].get("content", "")
    if content:
        print(content, end="", flush=True)

# Using tools/function calling
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "What's the weather in New York?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"]
            }
        }
    }]
)

class venice_ai.resources.chat.ChatResource(client: venice_ai._client.VeniceClient)[source]

Bases: APIResource

Provides access to chat-related API operations.

Parameters:: client (venice_ai._client.VeniceClient) – The synchronous VeniceClient instance.

completions: venice_ai.resources.chat.completions.ChatCompletions: Access to chat completion creation operations.

class venice_ai.resources.chat.AsyncChatResource(client: venice_ai._async_client.AsyncVeniceClient)[source]

Bases: AsyncAPIResource

Provides asynchronous access to chat-related API operations.

Parameters:: client (venice_ai._async_client.AsyncVeniceClient) – The asynchronous AsyncVeniceClient instance.

completions: venice_ai.resources.chat.completions.AsyncChatCompletions: Access to asynchronous chat completion creation operations.

class venice_ai.resources.chat.completions.ChatCompletions(client: venice_ai._resource.SyncClientT)[source]

Bases: APIResource

Provides access to chat completion operations.

The class handles parameter validation, request formation, and response parsing for chat completion requests.

Parameters:: _client (venice_ai._client.VeniceClient) – The client instance used to make API requests.

Example

from venice_ai import VeniceClient

# Initialize the client
client = VeniceClient(api_key="your-api-key")

# Create a chat completion
response = client.chat.completions.create(
    model="venice-1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me about Venice AI."}
    ]
)

# Access the response content
print(response["choices"][0]["message"]["content"])

Parameters:: client (typing.TypeVar(SyncClientT, bound= VeniceClient))

Create a model response for the given chat conversation.

The method automatically formats the request body, applies appropriate defaults, and routes the request to either the standard or streaming endpoint based on the stream parameter.

Parameters:

model (str) – ID of the model to use (e.g., "venice-1", "llama-3.3-70b").
messages (Sequence[venice_ai.types.chat.MessageParam]) – Sequence of messages forming the conversation.
stream (bool) – If True, stream back partial progress. Defaults to False. Returns an Iterator[ChatCompletionChunk] if True, otherwise ChatCompletion.
stream_cls (Optional[Type[venice_ai.types.chat.ChunkModelFactory[venice_ai.types.chat.ChatCompletionChunk]]]) – Optional stream wrapper class for streaming responses. Must conform to the ChunkModelFactory protocol.
frequency_penalty (Optional[float]) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far.
max_tokens (Optional[int]) – Deprecated. Please use max_completion_tokens instead. The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model’s context length.
max_completion_tokens (Optional[int]) – Maximum number of tokens that can be generated in the chat completion.
n (Optional[int]) – Number of chat completion choices to generate for each input message.
presence_penalty (Optional[float]) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far.
response_format (Optional[venice_ai.types.chat.ResponseFormat]) – Specifies the format that the model must output (e.g., for JSON mode).
seed (Optional[int]) – Random seed for reproducible outputs.
stop (Optional[Union[str, Sequence[str]]]) – Up to 4 sequences where the API will stop generating further tokens.
temperature (Optional[float]) – Sampling temperature between 0.0 and 2.0. Higher values make output more random, lower values more focused and deterministic. Defaults to 0.7.
top_p (Optional[float]) – Nucleus sampling parameter between 0.0 and 1.0. Defaults to 1.0.
tools (Optional[Sequence[venice_ai.types.chat.Tool]]) – List of tools the model may call.
tool_choice (Optional[Union[Literal["none", "auto"], venice_ai.types.chat.ToolChoiceObject]]) – Controls which (if any) tool is called by the model. Can be "none", "auto", or a specific tool.
user (Optional[str]) – Unique identifier representing your end-user (discarded by API but supported for OpenAI compatibility).
venice_parameters (Optional[venice_ai.types.chat.VeniceParameters]) – Venice-specific parameters for fine-tuning model behavior.
logprobs (Optional[bool]) – Whether to return log probabilities of the output tokens.
top_logprobs (Optional[int]) – Number of most likely tokens to return at each token position if logprobs is True.
parallel_tool_calls (Optional[bool]) – Whether to enable parallel function calling during tool use.
repetition_penalty (Optional[float]) – Penalty for token repetition.
stop_token_ids (Optional[Sequence[int]]) – List of token IDs at which to stop generation.
top_k (Optional[int]) – Number of highest probability vocabulary tokens to keep for top-k-filtering.
stream_options (Optional[venice_ai.types.chat.StreamOptions]) – Additional options for controlling streaming behavior.
logit_bias (Optional[Dict[str, int]]) – Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100.
kwargs (typing.Any) – Additional keyword arguments.

Returns:

A ChatCompletion if stream is False, otherwise an Iterator of ChatCompletionChunk.

Return type:

Union[venice_ai.types.chat.ChatCompletion, Iterator[venice_ai.types.chat.ChatCompletionChunk]]

Raises:

venice_ai.exceptions.InvalidRequestError – If parameters are invalid or malformed.
venice_ai.exceptions.AuthenticationError – If the API key is invalid or missing.
venice_ai.exceptions.PermissionDeniedError – If access is denied to the requested model or feature.
venice_ai.exceptions.NotFoundError – If the model or resource is not found.
venice_ai.exceptions.RateLimitError – If rate limits are exceeded for the account.
venice_ai.exceptions.APIError – For other API-related errors not covered by specific exceptions.

Example

# Non-streaming usage with system and user messages
from venice_ai import VeniceClient
client = VeniceClient(api_key="your-api-key")
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant specializing in Python."},
        {"role": "user", "content": "Write a function to calculate the Fibonacci sequence."}
    ],
    temperature=0.3  # More deterministic/focused response
)
print(response["choices"][0]["message"]["content"])

# Streaming usage with progress display
for chunk in client.chat.completions.create(
    model="venice-1",
    messages=[{"role": "user", "content": "Explain quantum computing briefly."}],
    stream=True,
    max_completion_tokens=250  # Limit response length
):
    content = chunk["choices"][0]["delta"].get("content", "")
    if content:
        print(content, end="", flush=True)

# Using tools/function calling
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "What's the weather in New York?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"]
            }
        }
    }]
)

class venice_ai.resources.chat.completions.AsyncChatCompletions(client: venice_ai._resource.AsyncClientT)[source]

Bases: AsyncAPIResource

Provides access to asynchronous chat completion operations.

The class handles parameter validation, request formation, and response parsing for asynchronous chat completion requests.

Parameters:: _client (venice_ai._async_client.AsyncVeniceClient) – The client instance used to make API requests.

Example

from venice_ai import AsyncVeniceClient
import asyncio

async def main():
    # Initialize the async client
    client = AsyncVeniceClient(api_key="your-api-key")

    # Create a chat completion asynchronously
    response = await client.chat.completions.create(
        model="venice-1",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Tell me about Venice AI."}
        ]
    )

    # Access the response content
    print(response["choices"][0]["message"]["content"])

# Run the async function
asyncio.run(main())

Parameters:: client (typing.TypeVar(AsyncClientT, bound= AsyncVeniceClient))

Create a model response for the given chat conversation asynchronously.

The method automatically formats the request body, applies appropriate defaults, and routes the request to either the standard or streaming endpoint based on the stream parameter.

Parameters:

model (str) – ID of the model to use (e.g., "venice-1", "llama-3.3-70b").
messages (Sequence[venice_ai.types.chat.MessageParam]) – Sequence of messages forming the conversation.
stream (bool) – If True, stream back partial progress. Defaults to False. Returns an AsyncIterator[ChatCompletionChunk] if True, otherwise ChatCompletion.
stream_cls (Optional[Type[venice_ai.types.chat.ChunkModelFactory[venice_ai.types.chat.ChatCompletionChunk]]]) – Optional stream wrapper class for streaming responses. Must conform to the ChunkModelFactory protocol.
frequency_penalty (Optional[float]) – Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far.
max_tokens (Optional[int]) – Deprecated. Please use max_completion_tokens instead. The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model’s context length.
max_completion_tokens (Optional[int]) – Maximum number of tokens that can be generated in the chat completion.
n (Optional[int]) – Number of chat completion choices to generate for each input message.
presence_penalty (Optional[float]) – Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far.
response_format (Optional[venice_ai.types.chat.ResponseFormat]) – Specifies the format that the model must output (e.g., for JSON mode).
seed (Optional[int]) – Random seed for reproducible outputs.
stop (Optional[Union[str, Sequence[str]]]) – Up to 4 sequences where the API will stop generating further tokens.
temperature (Optional[float]) – Sampling temperature between 0.0 and 2.0. Higher values make output more random, lower values more focused and deterministic. Defaults to 0.7.
top_p (Optional[float]) – Nucleus sampling parameter between 0.0 and 1.0. Defaults to 1.0.
tools (Optional[Sequence[venice_ai.types.chat.Tool]]) – List of tools the model may call.
tool_choice (Optional[Union[Literal["none", "auto"], venice_ai.types.chat.ToolChoiceObject]]) – Controls which (if any) tool is called by the model. Can be "none", "auto", or a specific tool.
user (Optional[str]) – Unique identifier representing your end-user (discarded by API but supported for OpenAI compatibility).
venice_parameters (Optional[venice_ai.types.chat.VeniceParameters]) – Venice-specific parameters for fine-tuning model behavior.
logprobs (Optional[bool]) – Whether to return log probabilities of the output tokens.
top_logprobs (Optional[int]) – Number of most likely tokens to return at each token position if logprobs is True.
parallel_tool_calls (Optional[bool]) – Whether to enable parallel function calling during tool use.
repetition_penalty (Optional[float]) – Penalty for token repetition.
stop_token_ids (Optional[Sequence[int]]) – List of token IDs at which to stop generation.
top_k (Optional[int]) – Number of highest probability vocabulary tokens to keep for top-k-filtering.
stream_options (Optional[venice_ai.types.chat.StreamOptions]) – Additional options for controlling streaming behavior.
logit_bias (Optional[Dict[str, int]]) – Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100.
kwargs (typing.Any) – Additional keyword arguments.

Returns:

A ChatCompletion if stream is False, otherwise an AsyncIterator of ChatCompletionChunk.

Return type:

Union[venice_ai.types.chat.ChatCompletion, AsyncIterator[venice_ai.types.chat.ChatCompletionChunk]]

Raises:

venice_ai.exceptions.InvalidRequestError – If parameters are invalid or malformed.
venice_ai.exceptions.AuthenticationError – If the API key is invalid or missing.
venice_ai.exceptions.PermissionDeniedError – If access is denied to the requested model or feature.
venice_ai.exceptions.NotFoundError – If the model or resource is not found.
venice_ai.exceptions.RateLimitError – If rate limits are exceeded for the account.
venice_ai.exceptions.APIError – For other API-related errors not covered by specific exceptions.

Example

# Non-streaming async usage
import asyncio
from venice_ai import AsyncVeniceClient

async def main():
    client = AsyncVeniceClient(api_key="your-api-key")
    response = await client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Explain async programming in Python."}
        ],
        temperature=0.3
    )
    print(response["choices"][0]["message"]["content"])

asyncio.run(main())

# Async streaming usage
async def stream_example():
    client = AsyncVeniceClient(api_key="your-api-key")
    async for chunk in await client.chat.completions.create(
        model="venice-1",
        messages=[{"role": "user", "content": "Tell me a story."}],
        stream=True,
        max_completion_tokens=200
    ):
        content = chunk["choices"][0]["delta"].get("content", "")
        if content:
            print(content, end="", flush=True)

asyncio.run(stream_example())

Models Resources¶

class venice_ai.resources.models.Models(client: venice_ai._resource.SyncClientT)[source]

Bases: APIResource

Provides access to model listing and capability information.

This class manages synchronous operations for retrieving information about available AI models, their traits, and compatibility mappings. It provides methods to list models, query model traits (semantic shortcuts), and get compatibility mappings between external model names and Venice model IDs.

Parameters:: client (VeniceClient) – The Venice client instance used for API requests.

Example

Basic usage through a Venice client:

from venice_ai import VeniceClient

client = VeniceClient()
models = client.models.list()
for model in models.data:
    print(f"Model: {model.name} (ID: {model.id})")

list(*, type: Literal['embedding', 'image', 'text', 'tts', 'upscale'] | None = None)[source]

Lists available models.

Retrieves a list of AI models available through the Venice API. Models can optionally be filtered by type to narrow down results to specific categories such as text generation, image generation, or embedding models.

Parameters:: type (Optional[venice_ai.types.models.ModelType]) – Optional filter for model type. Valid values include "text", "image", "embedding", "tts", and "upscale". If not provided, all available models are returned.
Returns:: A list of available models with their metadata, capabilities, and pricing information.
Return type:: venice_ai.types.models.ModelList
Raises:: venice_ai.exceptions.APIError – If an API error occurs during the request.

Example

List all available models:

models = client.models.list()
for model in models.data:
    print(f"Model ID: {model.id}, Name: {model.name}")

Filter models by type:

chat_models = client.models.list(type="text")
image_models = client.models.list(type="image")

list_compatibility(*, type: Literal['embedding', 'image', 'text', 'tts', 'upscale'] | None = None)[source]

Lists model compatibility mapping between external model names and Venice model IDs.

Retrieves a mapping that allows applications to reference external model identifiers (e.g., from other AI platforms like OpenAI) and have them automatically mapped to equivalent Venice models. This compatibility layer facilitates smoother transitions when migrating applications from other AI platforms to Venice.

Parameters:: type (Optional[venice_ai.types.models.ModelType]) – Optional filter for model type. Only compatibility mappings for models of the specified type will be returned. Valid values include "text", "image", "embedding", "tts", and "upscale".
Returns:: A mapping of external model names to their equivalent Venice model IDs.
Return type:: venice_ai.types.models.ModelCompatibilityList
Raises:: venice_ai.exceptions.APIError – If an API error occurs during the request.

Example

Get all compatibility mappings:

compatibility = client.models.list_compatibility()
venice_model = compatibility.data.get("gpt-4")
print(f"GPT-4 maps to Venice model: {venice_model}")

Get compatibility for specific model type:

text_compat = client.models.list_compatibility(type="text")
for external_name, venice_id in text_compat.data.items():
    print(f"{external_name} -> {venice_id}")

list_traits(*, type: Literal['embedding', 'image', 'text', 'tts', 'upscale'] | None = None)[source]

Lists model traits and their associated model IDs.

Retrieves a mapping of semantic trait names (e.g., “default”, “fastest”, “best”) to their corresponding model IDs. Traits provide convenient shortcuts for selecting models based on desired characteristics rather than specific model identifiers, making it easier to choose appropriate models without needing to know exact model versions or IDs.

Parameters:: type (Optional[venice_ai.types.models.ModelType]) – Optional filter for model type. Only traits for models of the specified type will be returned. Valid values include "text", "image", "embedding", "tts", and "upscale".
Returns:: A mapping of trait names to their corresponding model IDs.
Return type:: venice_ai.types.models.ModelTraitList
Raises:: venice_ai.exceptions.APIError – If an API error occurs during the request.

Example

Get all model traits:

traits = client.models.list_traits()
default_model = traits.data.get("default")
fastest_model = traits.data.get("fastest")

Get traits for specific model type:

text_traits = client.models.list_traits(type="text")
print(f"Default text model: {text_traits.data['default']}")

class venice_ai.resources.models.AsyncModels(client: venice_ai._resource.AsyncClientT)[source]

Bases: AsyncAPIResource

Provides access to model listing and capability information (asynchronous).

This class manages asynchronous operations for retrieving information about available AI models, their traits, and compatibility mappings. It provides async methods to list models, query model traits (semantic shortcuts), and get compatibility mappings between external model names and Venice model IDs.

Parameters:: client (AsyncVeniceClient) – The async Venice client instance used for API requests.

Example

Basic usage through an async Venice client:

from venice_ai import AsyncVeniceClient

async def main():
    client = AsyncVeniceClient()
    models = await client.models.list()
    for model in models.data:
        print(f"Model: {model.name} (ID: {model.id})")

async list(*, type: Literal['embedding', 'image', 'text', 'tts', 'upscale'] | None = None)[source]

Lists available models asynchronously.

Asynchronously retrieves a list of AI models available through the Venice API. Models can optionally be filtered by type to narrow down results to specific categories such as text generation, image generation, or embedding models.

Parameters:: type (Optional[venice_ai.types.models.ModelType]) – Optional filter for model type. Valid values include "text", "image", "embedding", "tts", and "upscale". If not provided, all available models are returned.
Returns:: A list of available models with their metadata, capabilities, and pricing information.
Return type:: venice_ai.types.models.ModelList
Raises:: venice_ai.exceptions.APIError – If an API error occurs during the request.

Example

List all available models:

models = await client.models.list()
for model in models.data:
    print(f"Model ID: {model.id}, Name: {model.name}")

Filter models by type:

chat_models = await client.models.list(type="text")
image_models = await client.models.list(type="image")

async list_compatibility(*, type: Literal['embedding', 'image', 'text', 'tts', 'upscale'] | None = None)[source]

Lists model compatibility mapping between external model names and Venice model IDs asynchronously.

Asynchronously retrieves a mapping that allows applications to reference external model identifiers (e.g., from other AI platforms like OpenAI) and have them automatically mapped to equivalent Venice models. This compatibility layer facilitates smoother transitions when migrating applications from other AI platforms to Venice.

Parameters:: type (Optional[venice_ai.types.models.ModelType]) – Optional filter for model type. Only compatibility mappings for models of the specified type will be returned. Valid values include "text", "image", "embedding", "tts", and "upscale".
Returns:: A mapping of external model names to their equivalent Venice model IDs.
Return type:: venice_ai.types.models.ModelCompatibilityList
Raises:: venice_ai.exceptions.APIError – If an API error occurs during the request.

Example

Get all compatibility mappings:

compatibility = await client.models.list_compatibility()
venice_model = compatibility.data.get("gpt-4")
print(f"GPT-4 maps to Venice model: {venice_model}")

Get compatibility for specific model type:

text_compat = await client.models.list_compatibility(type="text")
for external_name, venice_id in text_compat.data.items():
    print(f"{external_name} -> {venice_id}")

async list_traits(*, type: Literal['embedding', 'image', 'text', 'tts', 'upscale'] | None = None)[source]

Lists model traits and their associated model IDs asynchronously.

Asynchronously retrieves a mapping of semantic trait names (e.g., “default”, “fastest”, “best”) to their corresponding model IDs. Traits provide convenient shortcuts for selecting models based on desired characteristics rather than specific model identifiers, making it easier to choose appropriate models without needing to know exact model versions or IDs.

Parameters:: type (Optional[venice_ai.types.models.ModelType]) – Optional filter for model type. Only traits for models of the specified type will be returned. Valid values include "text", "image", "embedding", "tts", and "upscale".
Returns:: A mapping of trait names to their corresponding model IDs.
Return type:: venice_ai.types.models.ModelTraitList
Raises:: venice_ai.exceptions.APIError – If an API error occurs during the request.

Example

Get all model traits:

traits = await client.models.list_traits()
default_model = traits.data.get("default")
fastest_model = traits.data.get("fastest")

Get traits for specific model type:

text_traits = await client.models.list_traits(type="text")
print(f"Default text model: {text_traits.data['default']}")

Image Resources¶

Resource for interacting with the Venice AI image-related API endpoints.

This module provides both synchronous and asynchronous classes for generating images, upscaling images, and listing available image styles. It implements the core functionality for the Venice AI image generation services through a clean, typed interface matching the API specification.

class venice_ai.resources.image.AsyncImage(client: venice_ai._resource.AsyncClientT)[source]

Provides access to asynchronous image generation, upscaling, and style listing operations.

This class manages asynchronous image operations using Venice AI’s image API. It mirrors the Image class functionality but uses non-blocking async/await operations for use in asyncio applications.

All methods return awaitable coroutines. For synchronous calls, use the Image class.

Parameters:: client (venice_ai._async_client.AsyncVeniceClient) – The async Venice AI client instance used for making API requests.

Generate an image using Venice AI’s image generation API asynchronously.

This method creates a new image based on a text prompt using the specified model, executing the request asynchronously for use in async/await contexts. It provides comprehensive control over the image generation process with multiple parameters to customize the output.

Parameters:

model (str) – Model ID for image generation (e.g., "venice-sd35").
prompt (str) – Text prompt describing the image to generate.
cfg_scale (Optional[float]) – Optional. Classifier Free Guidance scale (1.0-30.0). Higher values adhere more strictly to the prompt.
embed_exif_metadata (Optional[bool]) – Optional. If True, embed generation metadata in EXIF data.
format (Optional[Literal["jpeg", "png", "webp"]]) – Optional. Output image format.
height (Optional[int]) – Optional. Height of the generated image in pixels.
hide_watermark (Optional[bool]) – Optional. If True, hide Venice AI watermark from the generated image.
lora_strength (Optional[int]) – Optional. Strength of LoRA model adaptation (0-100).
num_images (Optional[int]) – Optional. Number of images to generate (typically 1-10).
negative_prompt (Optional[str]) – Optional. Text describing what to avoid in the generated image.
return_binary (Optional[bool]) – Optional. If True, return raw image bytes instead of JSON response with base64 data.
safe_mode (Optional[bool]) – Optional. If True, enable content filtering for safer outputs.
seed (Optional[int]) – Optional. Random seed for reproducible image generation results.
steps (Optional[int]) – Optional. Number of diffusion steps. Higher values generally improve quality but increase generation time.
style_preset (Optional[str]) – Optional. Style preset ID from list_styles() to apply to the generated image.
width (Optional[int]) – Optional. Width of the generated image in pixels.

Returns:

Response containing generated image data as base64 string, or raw image bytes if return_binary is True.

Return type:

Union[ImageResponse, bytes]

Raises:

venice_ai.exceptions.APIError – If an API error occurs during image generation.

Example:

# client = AsyncVeniceClient()
response = await client.image.generate(
    model="venice-sd35",
    prompt="A serene landscape with mountains and a lake",
    width=1024,
    height=768,
    steps=30
)
# Process response.images[0] (base64 string)

async get_available_styles()[source]

Retrieve the list of available image generation styles from the API asynchronously.

This method fetches the most up-to-date list of styles that can be used for image generation, such as ‘cinematic’, ‘photorealistic’, etc. It performs this operation asynchronously for use in async/await contexts.

Returns:: An object containing a list of available image styles.
Return type:: ImageStyleList
Raises:: venice_ai.exceptions.APIError – If an API error occurs during the request.

Example:

# client = AsyncVeniceClient()
styles_response = await client.image.get_available_styles()
for style_name in styles_response.data:
    print(f"Available style: {style_name}")

async list_styles()[source]

List available image style presets asynchronously for use with image generation.

This method retrieves all available style presets that can be used with the style_preset parameter in the generate() method to influence the aesthetic and artistic style of generated images. It performs this operation asynchronously.

Returns:: A list of available image style presets with their identifiers.
Return type:: ImageStyleList
Raises:: venice_ai.exceptions.APIError – If an API error occurs while retrieving styles.

async simple_generate(*, model: str, prompt: str, background: Literal['transparent', 'opaque', 'auto'] | None = None, moderation: Literal['low', 'auto'] | None = None, n: int | None = None, output_compression: int | None = None, output_format: Literal['jpeg', 'png', 'webp'] | None = None, quality: Literal['auto', 'high', 'medium', 'low', 'hd', 'standard'] | None = None, response_format: Literal['b64_json', 'url'] | None = None, size: Literal['auto', '256x256', '512x512', '1024x1024', '1536x1024', '1024x1536', '1792x1024', '1024x1792'] | None = None, style: Literal['vivid', 'natural'] | None = None, user: str | None = None)[source]

Generate an image using Venice AI’s simple image generation API asynchronously (OpenAI-compatible).

This method provides a simplified interface for image generation that’s compatible with OpenAI’s DALL-E API format, executed asynchronously for use in async/await contexts. It’s designed to be easier to use than the full generate() method while still providing essential customization options.

Parameters:

model (str) – Model ID for image generation.
prompt (str) – Text prompt describing the image to generate.
background (Optional[Literal["transparent", "opaque", "auto"]]) – Optional. Background style for the generated image.
moderation (Optional[Literal["low", "auto"]]) – Optional. Content moderation level to apply.
n (Optional[int]) – Optional. Number of images to generate (typically 1-10).
output_compression (Optional[int]) – Optional. Output image compression level (0-100, where 100 is highest quality).
output_format (Optional[Literal["jpeg", "png", "webp"]]) – Optional. Output image format.
quality (Optional[Literal["auto", "high", "medium", "low", "hd", "standard"]]) – Optional. Image quality setting.
response_format (Optional[Literal["b64_json", "url"]]) – Optional. Format of the response data.
size (Optional[Literal["auto", "256x256", "512x512", "1024x1024", "1536x1024", "1024x1536", "1792x1024", "1024x1792"]]) – Optional. Dimensions of the generated image.
style (Optional[Literal["vivid", "natural"]]) – Optional. Style of the generated image.
user (Optional[str]) – Optional. User identifier for tracking and analytics purposes.

Returns:

API response containing generated images as base64 data or URLs.

Return type:

SimpleImageResponse

Raises:

venice_ai.exceptions.APIError – If an API error occurs during image generation.

Example:

# client = AsyncVeniceClient()
response = await client.image.simple_generate(
    model="venice-sd35",
    prompt="A cute cat sitting on a windowsill",
    size="1024x1024",
    style="natural"
)
# Process response.data[0] (ImageDataItem)

Upscale an image using Venice AI’s image upscaling API asynchronously.

This method allows for increasing the resolution of an image while maintaining or enhancing its quality using Venice AI’s upscaling technology, in an asynchronous manner compatible with asyncio applications.

Parameters:

image (Union[str, bytes, BinaryIO]) – Image to upscale. Can be a file path (string), raw image bytes, or a file-like object.
enhance (Optional[bool]) – Optional. Whether to enhance image quality during upscaling.
enhance_creativity (Optional[float]) – Optional. Creativity level for enhancement (0.0-1.0, where 1.0 is most creative).
enhance_prompt (Optional[bool]) – Optional. Whether to use text prompt guidance for enhancement.
replication (Optional[float]) – Optional. Replication factor for matching the original image (0.0-1.0, where 1.0 matches exactly).
scale (Optional[float]) – Optional. Scaling factor for upscaling (e.g., 2.0 for 2x upscaling).
timeout (Optional[Union[float, httpx.Timeout]]) – Optional. Request timeout configuration.

Returns:

Raw bytes of the upscaled image.

Return type:

bytes

Raises:

ValueError – If image path is invalid or image type is unsupported.
TypeError – If image content type is unsupported.
venice_ai.exceptions.APIError – If an API error occurs during upscaling.

class venice_ai.resources.image.Image(client: venice_ai._resource.SyncClientT)[source]

Provides access to image generation, upscaling, and style listing operations.

This class manages synchronous image operations using Venice AI’s image API. It encapsulates functionality for image generation, upscaling, and style listing through a clean, typed interface that makes synchronous HTTP requests.

All methods in this class make synchronous HTTP requests. For non-blocking calls, use the AsyncImage class.

Parameters:: client (venice_ai._client.VeniceClient) – The Venice AI client instance used for making API requests.

Generate an image using Venice AI’s image generation API.

This method creates a new image based on a text prompt using the specified model. It provides comprehensive control over the image generation process with multiple parameters to customize the output.

Parameters:

model (str) – Model ID for image generation (e.g., "venice-sd35").
prompt (str) – Text prompt describing the image to generate.
cfg_scale (Optional[float]) – Optional. Classifier Free Guidance scale (1.0-30.0). Higher values adhere more strictly to the prompt.
embed_exif_metadata (Optional[bool]) – Optional. If True, embed generation metadata in EXIF data.
format (Optional[Literal["jpeg", "png", "webp"]]) – Optional. Output image format.
height (Optional[int]) – Optional. Height of the generated image in pixels.
hide_watermark (Optional[bool]) – Optional. If True, hide Venice AI watermark from the generated image.
lora_strength (Optional[int]) – Optional. Strength of LoRA model adaptation (0-100).
num_images (Optional[int]) – Optional. Number of images to generate (typically 1-10).
negative_prompt (Optional[str]) – Optional. Text describing what to avoid in the generated image.
return_binary (Optional[bool]) – Optional. If True, return raw image bytes instead of JSON response with base64 data.
safe_mode (Optional[bool]) – Optional. If True, enable content filtering for safer outputs.
seed (Optional[int]) – Optional. Random seed for reproducible image generation results.
steps (Optional[int]) – Optional. Number of diffusion steps. Higher values generally improve quality but increase generation time.
style_preset (Optional[str]) – Optional. Style preset ID from list_styles() to apply to the generated image.
width (Optional[int]) – Optional. Width of the generated image in pixels.

Returns:

Response containing generated image data as base64 string, or raw image bytes if return_binary is True.

Return type:

Union[ImageResponse, bytes]

Raises:

venice_ai.exceptions.APIError – If an API error occurs during image generation.

Example:

# client = VeniceClient()
response = client.image.generate(
    model="venice-sd35",
    prompt="A serene landscape with mountains and a lake",
    width=1024,
    height=768,
    steps=30
)
# Process response.images[0] (base64 string)

get_available_styles()[source]

Retrieve the list of available image generation styles from the API.

This method fetches the most up-to-date list of styles that can be used for image generation, such as ‘cinematic’, ‘photorealistic’, etc.

Returns:: An object containing a list of available image styles.
Return type:: ImageStyleList
Raises:: venice_ai.exceptions.APIError – If an API error occurs during the request.

Example:

# client = VeniceClient()
styles_response = client.image.get_available_styles()
for style_name in styles_response.data:
    print(f"Available style: {style_name}")

list_styles()[source]

List available image style presets for use with image generation.

This method retrieves all available style presets that can be used with the style_preset parameter in the generate() method to influence the aesthetic and artistic style of generated images.

Returns:: A list of available image style presets with their identifiers.
Return type:: ImageStyleList
Raises:: venice_ai.exceptions.APIError – If an API error occurs while retrieving styles.

simple_generate(*, model: str, prompt: str, background: Literal['transparent', 'opaque', 'auto'] | None = None, moderation: Literal['low', 'auto'] | None = None, n: int | None = None, output_compression: int | None = None, output_format: Literal['jpeg', 'png', 'webp'] | None = None, quality: Literal['auto', 'high', 'medium', 'low', 'hd', 'standard'] | None = None, response_format: Literal['b64_json', 'url'] | None = None, size: Literal['auto', '256x256', '512x512', '1024x1024', '1536x1024', '1024x1536', '1792x1024', '1024x1792'] | None = None, style: Literal['vivid', 'natural'] | None = None, user: str | None = None)[source]

Generate an image using Venice AI’s simple image generation API (OpenAI-compatible).

This method provides a simplified interface for image generation that’s compatible with OpenAI’s DALL-E API format. It’s designed to be easier to use than the full generate() method while still providing essential customization options.

Parameters:

model (str) – Model ID for image generation.
prompt (str) – Text prompt describing the image to generate.
background (Optional[Literal["transparent", "opaque", "auto"]]) – Optional. Background style for the generated image.
moderation (Optional[Literal["low", "auto"]]) – Optional. Content moderation level to apply.
n (Optional[int]) – Optional. Number of images to generate (typically 1-10).
output_compression (Optional[int]) – Optional. Output image compression level (0-100, where 100 is highest quality).
output_format (Optional[Literal["jpeg", "png", "webp"]]) – Optional. Output image format.
quality (Optional[Literal["auto", "high", "medium", "low", "hd", "standard"]]) – Optional. Image quality setting.
response_format (Optional[Literal["b64_json", "url"]]) – Optional. Format of the response data.
size (Optional[Literal["auto", "256x256", "512x512", "1024x1024", "1536x1024", "1024x1536", "1792x1024", "1024x1792"]]) – Optional. Dimensions of the generated image.
style (Optional[Literal["vivid", "natural"]]) – Optional. Style of the generated image.
user (Optional[str]) – Optional. User identifier for tracking and analytics purposes.

Returns:

API response containing generated images as base64 data or URLs.

Return type:

SimpleImageResponse

Raises:

venice_ai.exceptions.APIError – If an API error occurs during image generation.

Example:

# client = VeniceClient()
response = client.image.simple_generate(
    model="venice-sd35",
    prompt="A cute cat sitting on a windowsill",
    size="1024x1024",
    style="natural"
)
# Process response.data[0] (ImageDataItem)

Upscale an image using Venice AI’s image upscaling API.

This method allows for increasing the resolution of an image while maintaining or enhancing its quality using Venice AI’s upscaling technology.

Parameters:

image (Union[str, bytes, BinaryIO]) – Image to upscale. Can be a file path (string), raw image bytes, or a file-like object.
enhance (Optional[bool]) – Optional. Whether to enhance image quality during upscaling.
enhance_creativity (Optional[float]) – Optional. Creativity level for enhancement (0.0-1.0, where 1.0 is most creative).
enhance_prompt (Optional[bool]) – Optional. Whether to use text prompt guidance for enhancement.
replication (Optional[float]) – Optional. Replication factor for matching the original image (0.0-1.0, where 1.0 matches exactly).
scale (Optional[float]) – Optional. Scaling factor for upscaling (e.g., 2.0 for 2x upscaling).
timeout (Optional[Union[float, httpx.Timeout]]) – Optional. Request timeout configuration.

Returns:

Raw bytes of the upscaled image.

Return type:

bytes

Raises:

ValueError – If image path is invalid or image type is unsupported.
TypeError – If image content type is unsupported.
venice_ai.exceptions.APIError – If an API error occurs during upscaling.

class venice_ai.resources.image.Image(client: venice_ai._resource.SyncClientT)[source]

Bases: APIResource

Provides access to image generation, upscaling, and style listing operations.

All methods in this class make synchronous HTTP requests. For non-blocking calls, use the AsyncImage class.

Parameters:: client (venice_ai._client.VeniceClient) – The Venice AI client instance used for making API requests.

Generate an image using Venice AI’s image generation API.

This method creates a new image based on a text prompt using the specified model. It provides comprehensive control over the image generation process with multiple parameters to customize the output.

Parameters:

model (str) – Model ID for image generation (e.g., "venice-sd35").
prompt (str) – Text prompt describing the image to generate.
cfg_scale (Optional[float]) – Optional. Classifier Free Guidance scale (1.0-30.0). Higher values adhere more strictly to the prompt.
embed_exif_metadata (Optional[bool]) – Optional. If True, embed generation metadata in EXIF data.
format (Optional[Literal["jpeg", "png", "webp"]]) – Optional. Output image format.
height (Optional[int]) – Optional. Height of the generated image in pixels.
hide_watermark (Optional[bool]) – Optional. If True, hide Venice AI watermark from the generated image.
lora_strength (Optional[int]) – Optional. Strength of LoRA model adaptation (0-100).
num_images (Optional[int]) – Optional. Number of images to generate (typically 1-10).
negative_prompt (Optional[str]) – Optional. Text describing what to avoid in the generated image.
return_binary (Optional[bool]) – Optional. If True, return raw image bytes instead of JSON response with base64 data.
safe_mode (Optional[bool]) – Optional. If True, enable content filtering for safer outputs.
seed (Optional[int]) – Optional. Random seed for reproducible image generation results.
steps (Optional[int]) – Optional. Number of diffusion steps. Higher values generally improve quality but increase generation time.
style_preset (Optional[str]) – Optional. Style preset ID from list_styles() to apply to the generated image.
width (Optional[int]) – Optional. Width of the generated image in pixels.

Returns:

Response containing generated image data as base64 string, or raw image bytes if return_binary is True.

Return type:

Union[ImageResponse, bytes]

Raises:

venice_ai.exceptions.APIError – If an API error occurs during image generation.

Example:

# client = VeniceClient()
response = client.image.generate(
    model="venice-sd35",
    prompt="A serene landscape with mountains and a lake",
    width=1024,
    height=768,
    steps=30
)
# Process response.images[0] (base64 string)

list_styles()[source]

List available image style presets for use with image generation.

This method retrieves all available style presets that can be used with the style_preset parameter in the generate() method to influence the aesthetic and artistic style of generated images.

Returns:: A list of available image style presets with their identifiers.
Return type:: ImageStyleList
Raises:: venice_ai.exceptions.APIError – If an API error occurs while retrieving styles.

Generate an image using Venice AI’s simple image generation API (OpenAI-compatible).

Parameters:

model (str) – Model ID for image generation.
prompt (str) – Text prompt describing the image to generate.
background (Optional[Literal["transparent", "opaque", "auto"]]) – Optional. Background style for the generated image.
moderation (Optional[Literal["low", "auto"]]) – Optional. Content moderation level to apply.
n (Optional[int]) – Optional. Number of images to generate (typically 1-10).
output_compression (Optional[int]) – Optional. Output image compression level (0-100, where 100 is highest quality).
output_format (Optional[Literal["jpeg", "png", "webp"]]) – Optional. Output image format.
quality (Optional[Literal["auto", "high", "medium", "low", "hd", "standard"]]) – Optional. Image quality setting.
response_format (Optional[Literal["b64_json", "url"]]) – Optional. Format of the response data.
size (Optional[Literal["auto", "256x256", "512x512", "1024x1024", "1536x1024", "1024x1536", "1792x1024", "1024x1792"]]) – Optional. Dimensions of the generated image.
style (Optional[Literal["vivid", "natural"]]) – Optional. Style of the generated image.
user (Optional[str]) – Optional. User identifier for tracking and analytics purposes.

Returns:

API response containing generated images as base64 data or URLs.

Return type:

SimpleImageResponse

Raises:

venice_ai.exceptions.APIError – If an API error occurs during image generation.

Example:

# client = VeniceClient()
response = client.image.simple_generate(
    model="venice-sd35",
    prompt="A cute cat sitting on a windowsill",
    size="1024x1024",
    style="natural"
)
# Process response.data[0] (ImageDataItem)

Upscale an image using Venice AI’s image upscaling API.

This method allows for increasing the resolution of an image while maintaining or enhancing its quality using Venice AI’s upscaling technology.

Parameters:

image (Union[str, bytes, BinaryIO]) – Image to upscale. Can be a file path (string), raw image bytes, or a file-like object.
enhance (Optional[bool]) – Optional. Whether to enhance image quality during upscaling.
enhance_creativity (Optional[float]) – Optional. Creativity level for enhancement (0.0-1.0, where 1.0 is most creative).
enhance_prompt (Optional[bool]) – Optional. Whether to use text prompt guidance for enhancement.
replication (Optional[float]) – Optional. Replication factor for matching the original image (0.0-1.0, where 1.0 matches exactly).
scale (Optional[float]) – Optional. Scaling factor for upscaling (e.g., 2.0 for 2x upscaling).
timeout (Optional[Union[float, httpx.Timeout]]) – Optional. Request timeout configuration.

Returns:

Raw bytes of the upscaled image.

Return type:

bytes

Raises:

ValueError – If image path is invalid or image type is unsupported.
TypeError – If image content type is unsupported.
venice_ai.exceptions.APIError – If an API error occurs during upscaling.

class venice_ai.resources.image.AsyncImage(client: venice_ai._resource.AsyncClientT)[source]

Bases: AsyncAPIResource

Provides access to asynchronous image generation, upscaling, and style listing operations.

All methods return awaitable coroutines. For synchronous calls, use the Image class.

Parameters:: client (venice_ai._async_client.AsyncVeniceClient) – The async Venice AI client instance used for making API requests.

Generate an image using Venice AI’s image generation API asynchronously.

Parameters:

model (str) – Model ID for image generation (e.g., "venice-sd35").
prompt (str) – Text prompt describing the image to generate.
cfg_scale (Optional[float]) – Optional. Classifier Free Guidance scale (1.0-30.0). Higher values adhere more strictly to the prompt.
embed_exif_metadata (Optional[bool]) – Optional. If True, embed generation metadata in EXIF data.
format (Optional[Literal["jpeg", "png", "webp"]]) – Optional. Output image format.
height (Optional[int]) – Optional. Height of the generated image in pixels.
hide_watermark (Optional[bool]) – Optional. If True, hide Venice AI watermark from the generated image.
lora_strength (Optional[int]) – Optional. Strength of LoRA model adaptation (0-100).
num_images (Optional[int]) – Optional. Number of images to generate (typically 1-10).
negative_prompt (Optional[str]) – Optional. Text describing what to avoid in the generated image.
return_binary (Optional[bool]) – Optional. If True, return raw image bytes instead of JSON response with base64 data.
safe_mode (Optional[bool]) – Optional. If True, enable content filtering for safer outputs.
seed (Optional[int]) – Optional. Random seed for reproducible image generation results.
steps (Optional[int]) – Optional. Number of diffusion steps. Higher values generally improve quality but increase generation time.
style_preset (Optional[str]) – Optional. Style preset ID from list_styles() to apply to the generated image.
width (Optional[int]) – Optional. Width of the generated image in pixels.

Returns:

Response containing generated image data as base64 string, or raw image bytes if return_binary is True.

Return type:

Union[ImageResponse, bytes]

Raises:

venice_ai.exceptions.APIError – If an API error occurs during image generation.

Example:

# client = AsyncVeniceClient()
response = await client.image.generate(
    model="venice-sd35",
    prompt="A serene landscape with mountains and a lake",
    width=1024,
    height=768,
    steps=30
)
# Process response.images[0] (base64 string)

async list_styles()[source]

List available image style presets asynchronously for use with image generation.

Returns:: A list of available image style presets with their identifiers.
Return type:: ImageStyleList
Raises:: venice_ai.exceptions.APIError – If an API error occurs while retrieving styles.

Generate an image using Venice AI’s simple image generation API asynchronously (OpenAI-compatible).

Parameters:

model (str) – Model ID for image generation.
prompt (str) – Text prompt describing the image to generate.
background (Optional[Literal["transparent", "opaque", "auto"]]) – Optional. Background style for the generated image.
moderation (Optional[Literal["low", "auto"]]) – Optional. Content moderation level to apply.
n (Optional[int]) – Optional. Number of images to generate (typically 1-10).
output_compression (Optional[int]) – Optional. Output image compression level (0-100, where 100 is highest quality).
output_format (Optional[Literal["jpeg", "png", "webp"]]) – Optional. Output image format.
quality (Optional[Literal["auto", "high", "medium", "low", "hd", "standard"]]) – Optional. Image quality setting.
response_format (Optional[Literal["b64_json", "url"]]) – Optional. Format of the response data.
size (Optional[Literal["auto", "256x256", "512x512", "1024x1024", "1536x1024", "1024x1536", "1792x1024", "1024x1792"]]) – Optional. Dimensions of the generated image.
style (Optional[Literal["vivid", "natural"]]) – Optional. Style of the generated image.
user (Optional[str]) – Optional. User identifier for tracking and analytics purposes.

Returns:

API response containing generated images as base64 data or URLs.

Return type:

SimpleImageResponse

Raises:

venice_ai.exceptions.APIError – If an API error occurs during image generation.

Example:

# client = AsyncVeniceClient()
response = await client.image.simple_generate(
    model="venice-sd35",
    prompt="A cute cat sitting on a windowsill",
    size="1024x1024",
    style="natural"
)
# Process response.data[0] (ImageDataItem)

Upscale an image using Venice AI’s image upscaling API asynchronously.

Parameters:

image (Union[str, bytes, BinaryIO]) – Image to upscale. Can be a file path (string), raw image bytes, or a file-like object.
enhance (Optional[bool]) – Optional. Whether to enhance image quality during upscaling.
enhance_creativity (Optional[float]) – Optional. Creativity level for enhancement (0.0-1.0, where 1.0 is most creative).
enhance_prompt (Optional[bool]) – Optional. Whether to use text prompt guidance for enhancement.
replication (Optional[float]) – Optional. Replication factor for matching the original image (0.0-1.0, where 1.0 matches exactly).
scale (Optional[float]) – Optional. Scaling factor for upscaling (e.g., 2.0 for 2x upscaling).
timeout (Optional[Union[float, httpx.Timeout]]) – Optional. Request timeout configuration.

Returns:

Raw bytes of the upscaled image.

Return type:

bytes

Raises:

ValueError – If image path is invalid or image type is unsupported.
TypeError – If image content type is unsupported.
venice_ai.exceptions.APIError – If an API error occurs during upscaling.

API Keys Resources¶

Resource module for interacting with the Venice API keys endpoints.

This module provides both synchronous and asynchronous resource classes for managing API keys. API keys are used for authentication and authorization when making requests to the Venice API. They control access to various Venice API features and endpoints, and are subject to rate limits that govern the number of requests that can be made within a specific time period.

Classes:: ApiKeys: Synchronous client for API key management AsyncApiKeys: Asynchronous client for API key management

class venice_ai.resources.api_keys.ApiKeys(client: venice_ai._resource.SyncClientT)[source]

Provides access to API key management operations.

This class implements the synchronous interface for API key management, including creating, listing, deleting API keys, and managing rate limits. It inherits from APIResource which handles the underlying HTTP requests.

Parameters:: _client (VeniceClient) – The Venice client instance used for making API requests.

Example

from venice_ai import VeniceClient
from venice_ai.types.api_keys import ApiKeyCreateRequest

client = VeniceClient()

# List existing API keys
keys = client.api_keys.list(limit=10)
for key in keys:
    print(f"Key ID: {key.id}, Description: {key.description}")

# Create a new API key
create_request = ApiKeyCreateRequest(
    description="My Test Key",
    apiKeyType="INFERENCE"
)
new_key = client.api_keys.create(api_key_request=create_request)
print(f"Created key: {new_key.apiKey}")  # Only shown on creation

Parameters:: client (typing.TypeVar(SyncClientT, bound= VeniceClient))

create(*, api_key_request: venice_ai.types.api_keys.ApiKeyCreateRequest)[source]

Creates a new API key.

Creates a new API key with the specified parameters. The created API key will be returned only once in the response and cannot be retrieved later, so it should be securely stored immediately.

Parameters:

api_key_request (ApiKeyCreateRequest) –

Request object containing API key configuration. Must include at minimum a description and apiKeyType. The request can contain:

description (str): Human-readable description of the API key
apiKeyType (str): Type of API key (e.g., “INFERENCE”, “ADMIN”)
expiresAt (Optional[str]): ISO 8601 timestamp when key expires
consumptionLimit (Optional[int]): Maximum usage limit for the key

Returns:

Response containing the newly created API key details, including the secret key value (only returned once), key ID, creation timestamp, and other metadata.

Return type:

ApiKey

Raises:

venice_ai.exceptions.AuthenticationError – If authentication fails.
venice_ai.exceptions.APIError – If the API returns an error, such as when maximum API key limit is reached or invalid parameters are provided.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

Example

from venice_ai.types.api_keys import ApiKeyCreateRequest

# Create a basic API key
create_request = ApiKeyCreateRequest(
    description="My Test Key",
    apiKeyType="INFERENCE"
)
new_key = client.api_keys.create(api_key_request=create_request)
print(f"Created key ID: {new_key.id}")
print(f"API Key: {new_key.apiKey}")  # Store this securely!

# Create a key with expiration and limits
advanced_request = ApiKeyCreateRequest(
    description="Limited Production Key",
    apiKeyType="INFERENCE",
    expiresAt="2024-12-31T23:59:59Z",
    consumptionLimit=10000
)
limited_key = client.api_keys.create(api_key_request=advanced_request)

create_web3_key(*, web3_key_request: venice_ai.types.api_keys.ApiKeyGenerateWeb3KeyCreateRequest)[source]

Creates a new Web3 API key.

Creates a new API key authenticated via a Web3 signature.

Parameters:

web3_key_request (ApiKeyGenerateWeb3KeyCreateRequest) – Request body containing Web3 authentication details (such as web3_network_id, web3_address, and signature) and API key parameters.

Returns:

Response containing the newly created API key details.

Return type:

ApiKeyGenerateWeb3KeyCreateResponse

Raises:

venice_ai.exceptions.APIError – If the API returns an error.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

delete(*, api_key_id: str)[source]

Deletes an API key.

Permanently deletes the specified API key. Once deleted, the API key can no longer be used to authenticate requests and this action cannot be undone. Use with caution in production environments.

Parameters:

api_key_id (str) – Unique identifier of the API key to delete. This is the key’s ID (not the secret key value) as returned by the create or list operations.

Returns:

Response indicating the result of the deletion operation, typically containing a success flag and deletion confirmation message.

Return type:

Dict[str, Any]

Raises:

venice_ai.exceptions.AuthenticationError – If authentication fails.
venice_ai.exceptions.APIError – If the API returns an error, such as when the API key ID does not exist or belongs to another account.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

Example

# Delete an API key (use with caution)
result = client.api_keys.delete(api_key_id="key_123456789")
print(f"Deletion result: {result}")

# Safe deletion pattern
keys = client.api_keys.list()
test_keys = [k for k in keys if "test" in k.description.lower()]
for test_key in test_keys:
    client.api_keys.delete(api_key_id=test_key.id)
    print(f"Deleted test key: {test_key.id}")

Retrieves rate limit logs for API keys.

Returns a history of rate limit events, such as when rate limits were reset, exceeded, or modified. This can be useful for understanding API usage patterns, diagnosing rate limit issues, and optimizing request timing.

Parameters:

api_key_id (Optional[str]) – Specific API key ID to get logs for. If not provided, returns logs for the current API key.
start_date (Optional[str]) – Start date for log retrieval in ISO 8601 format (e.g., “2024-01-01T00:00:00Z”). If not provided, uses a default lookback period.
end_date (Optional[str]) – End date for log retrieval in ISO 8601 format (e.g., “2024-01-31T23:59:59Z”). If not provided, uses current time.
limit (Optional[int]) – Maximum number of log entries to return per page.
page (Optional[int]) – Page number for pagination (1-based indexing).

Returns:

A list of rate limit log entries with timestamps, event types, and related metadata.

Return type:

RateLimitLogList

Raises:

venice_ai.exceptions.AuthenticationError – If authentication fails.
venice_ai.exceptions.APIError – If the API returns an error.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

Example

# Get recent rate limit logs
logs = client.api_keys.get_rate_limit_logs(limit=10)
for log_entry in logs:
    print(f"Event: {log_entry.event_type} at {log_entry.timestamp}")

# Get logs for a specific date range
logs = client.api_keys.get_rate_limit_logs(
    start_date="2024-01-01T00:00:00Z",
    end_date="2024-01-31T23:59:59Z",
    limit=50
)

get_rate_limits()[source]

Retrieves rate limit information for the current API key.

Returns information about the rate limits applied to the current API key, including the limits per minute, hour, day, and month, as well as the current usage against those limits.

Returns:

Rate limit information, including limits and current usage.

Return type:

RateLimitInfo

Raises:

venice_ai.exceptions.AuthenticationError – If authentication fails.
venice_ai.exceptions.APIError – If the API returns an error.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

get_web3_token()[source]

Retrieves a token for Web3 API key generation.

This token is required for the subsequent POST request to create a Web3 API key.

Returns:

Response containing the token required for Web3 key generation.

Return type:

ApiKeyGenerateWeb3KeyGetResponse

Raises:

venice_ai.exceptions.APIError – If the API returns an error.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

list(*, page: int | None = None, limit: int | None = None)[source]

Lists API keys for the authenticated account, with optional pagination.

Retrieves a list of API keys associated with the current account. This includes active and inactive API keys. Supports pagination for managing large numbers of API keys.

Parameters:

page (Optional[int]) – Page number to retrieve (1-based indexing). If not provided, returns the first page.
limit (Optional[int]) – Maximum number of API keys to return per page. If not provided, uses the server’s default limit.

Returns:

A list of API key objects containing metadata such as ID, description, creation date, and status. Note that the actual API key values are not included in the response for security reasons.

Return type:

List[ApiKey]

Raises:

venice_ai.exceptions.AuthenticationError – If authentication fails.
venice_ai.exceptions.APIError – If the API returns an error.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

Example

# List all API keys
all_keys = client.api_keys.list()

# List with pagination
page_keys = client.api_keys.list(page=1, limit=5)
for key in page_keys:
    print(f"Key ID: {key.id}, Description: {key.description}")

retrieve(*, api_key_id: str)[source]

Retrieves a specific API key by ID.

Fetches the details of a specific API key using its unique identifier. Note that the actual API key value is not included in the response for security reasons.

Parameters:

api_key_id (str) – Unique identifier of the API key to retrieve. This is the key’s ID (not the secret key value) as returned by the create or list operations.

Returns:

API key details including metadata such as description, creation date, expiration, usage statistics, and other configuration information.

Return type:

Dict[str, Any]

Raises:

venice_ai.exceptions.AuthenticationError – If authentication fails.
venice_ai.exceptions.NotFoundError – If the API key ID does not exist.
venice_ai.exceptions.APIError – If the API returns an error.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

Example

# Retrieve a specific API key
key_details = client.api_keys.retrieve(api_key_id="key_123456789")
print(f"Key description: {key_details['description']}")
print(f"Created at: {key_details['createdAt']}")

class venice_ai.resources.api_keys.AsyncApiKeys(client: venice_ai._resource.AsyncClientT)[source]

Provides access to API key management operations asynchronously.

This class implements the asynchronous interface for API key management, including creating, listing, deleting API keys, and managing rate limits. It inherits from AsyncAPIResource which handles the underlying HTTP requests. All methods return awaitable coroutines that should be awaited by the caller.

Parameters:: _client (AsyncVeniceClient) – The AsyncVeniceClient instance used for making asynchronous API requests.

Example

import asyncio
from venice_ai import AsyncVeniceClient
from venice_ai.types.api_keys import ApiKeyCreateRequest

async def manage_api_keys():
    client = AsyncVeniceClient()

    # List existing API keys
    keys = await client.api_keys.list(limit=10)
    for key in keys:
        print(f"Key ID: {key.id}, Description: {key.description}")

    # Create a new API key
    create_request = ApiKeyCreateRequest(
        description="My Async Test Key",
        apiKeyType="INFERENCE"
    )
    new_key = await client.api_keys.create(api_key_request=create_request)
    print(f"Created key: {new_key.apiKey}")  # Only shown on creation

asyncio.run(manage_api_keys())

Parameters:: client (typing.TypeVar(AsyncClientT, bound= AsyncVeniceClient))

async create(*, api_key_request: venice_ai.types.api_keys.ApiKeyCreateRequest)[source]

Creates a new API key asynchronously.

Creates a new API key with the specified parameters. The created API key will be returned only once in the response and cannot be retrieved later, so it should be securely stored immediately.

Parameters:

api_key_request (ApiKeyCreateRequest) –

Request object containing API key configuration. Must include at minimum a description and apiKeyType. The request can contain:

description (str): Human-readable description of the API key
apiKeyType (str): Type of API key (e.g., “INFERENCE”, “ADMIN”)
expiresAt (Optional[str]): ISO 8601 timestamp when key expires
consumptionLimit (Optional[ConsumptionLimit]): Usage limits for the key

Returns:

Response containing the newly created API key details, including the secret key value (only returned once), key ID, creation timestamp, and other metadata.

Return type:

ApiKey

Raises:

venice_ai.exceptions.AuthenticationError – If authentication fails.
venice_ai.exceptions.APIError – If the API returns an error, such as when maximum API key limit is reached or invalid parameters are provided.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

Example

from venice_ai.types.api_keys import ApiKeyCreateRequest

# Create a basic API key asynchronously
create_request = ApiKeyCreateRequest(
    description="My Async Test Key",
    apiKeyType="INFERENCE"
)
new_key = await client.api_keys.create(api_key_request=create_request)
print(f"Created key ID: {new_key.id}")
print(f"API Key: {new_key.apiKey}")  # Store this securely!

async create_web3_key(*, web3_key_request: venice_ai.types.api_keys.ApiKeyGenerateWeb3KeyCreateRequest)[source]

Creates a new Web3 API key asynchronously.

Creates a new API key authenticated via a Web3 signature.

Parameters:

web3_key_request (ApiKeyGenerateWeb3KeyCreateRequest) – Request body containing Web3 authentication details (such as web3_network_id, web3_address, and signature) and API key parameters.

Returns:

Response containing the newly created API key details.

Return type:

ApiKeyGenerateWeb3KeyCreateResponse

Raises:

venice_ai.exceptions.APIError – If the API returns an error.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

async delete(*, api_key_id: str)[source]

Deletes an API key asynchronously.

Permanently deletes the specified API key. Once deleted, the API key can no longer be used to authenticate requests and this action cannot be undone.

Parameters:

api_key_id (str) – ID of the API key to delete. This is the key’s unique identifier, not the secret key value.

Returns:

Response indicating the result of the operation, typically containing a success flag and deletion confirmation.

Return type:

Dict[str, Any]

Raises:

venice_ai.exceptions.AuthenticationError – If authentication fails.
venice_ai.exceptions.APIError – If the API returns an error, such as when the API key ID does not exist or belongs to another account.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

Example

# Delete an API key asynchronously (use with caution)
result = await client.api_keys.delete(api_key_id="key_123456789")
print(f"Deletion result: {result}")

# Safe deletion pattern
keys = await client.api_keys.list()
test_keys = [k for k in keys if "test" in k.description.lower()]
for test_key in test_keys:
    await client.api_keys.delete(api_key_id=test_key.id)
    print(f"Deleted test key: {test_key.id}")

Retrieves rate limit logs for API keys asynchronously.

Parameters:

api_key_id (Optional[str]) – Specific API key ID to get logs for. If not provided, returns logs for the current API key.
start_date (Optional[str]) – Start date for log retrieval in ISO 8601 format (e.g., “2024-01-01T00:00:00Z”). If not provided, uses a default lookback period.
end_date (Optional[str]) – End date for log retrieval in ISO 8601 format (e.g., “2024-01-31T23:59:59Z”). If not provided, uses current time.
limit (Optional[int]) – Maximum number of log entries to return per page.
page (Optional[int]) – Page number for pagination (1-based indexing).

Returns:

A list of rate limit log entries with timestamps, event types, and related metadata.

Return type:

RateLimitLogList

Raises:

venice_ai.exceptions.AuthenticationError – If authentication fails.
venice_ai.exceptions.APIError – If the API returns an error.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

Example

# Get recent rate limit logs asynchronously
logs = await client.api_keys.get_rate_limit_logs(limit=10)
for log_entry in logs:
    print(f"Event: {log_entry.event_type} at {log_entry.timestamp}")

# Get logs for a specific date range
logs = await client.api_keys.get_rate_limit_logs(
    start_date="2024-01-01T00:00:00Z",
    end_date="2024-01-31T23:59:59Z",
    limit=50
)

async get_rate_limits()[source]

Retrieves rate limit information for the current API key asynchronously.

Returns information about the rate limits applied to the current API key, including the limits per minute, hour, day, and month, as well as the current usage against those limits.

Returns:

Rate limit information, including limits and current usage.

Return type:

RateLimitInfo

Raises:

venice_ai.exceptions.AuthenticationError – If authentication fails.
venice_ai.exceptions.APIError – If the API returns an error.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

async get_web3_token()[source]

Retrieves a token for Web3 API key generation asynchronously.

This token is required for the subsequent POST request to create a Web3 API key.

Returns:

Response containing the token required for Web3 key generation.

Return type:

ApiKeyGenerateWeb3KeyGetResponse

Raises:

venice_ai.exceptions.APIError – If the API returns an error.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

async list(*, page: int | None = None, limit: int | None = None)[source]

Lists API keys for the authenticated account asynchronously, with optional pagination.

Retrieves a list of API keys associated with the current account. This includes active and inactive API keys. Supports pagination.

Parameters:

page (Optional[int]) – Page number to retrieve (1-based indexing). If not provided, returns the first page.
limit (Optional[int]) – Maximum number of API keys to return per page. If not provided, uses the server’s default limit.

Returns:

A list of API key objects containing metadata such as ID, description, creation date, and status. Note that the actual API key values are not included in the response for security reasons.

Return type:

List[ApiKey]

Raises:

venice_ai.exceptions.AuthenticationError – If authentication fails.
venice_ai.exceptions.APIError – If the API returns an error.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

Example

# List all API keys asynchronously
all_keys = await client.api_keys.list()

# List with pagination
page_keys = await client.api_keys.list(page=1, limit=5)
for key in page_keys:
    print(f"Key ID: {key.id}, Description: {key.description}")

async retrieve(*, api_key_id: str)[source]

Retrieves a specific API key by ID asynchronously.

Fetches the details of a specific API key using its unique identifier. Note that the actual API key value is not included in the response for security reasons.

Parameters:

api_key_id (str) – Unique identifier of the API key to retrieve. This is the key’s ID (not the secret key value) as returned by the create or list operations.

Returns:

API key details including metadata such as description, creation date, expiration, usage statistics, and other configuration information.

Return type:

Dict[str, Any]

Raises:

venice_ai.exceptions.AuthenticationError – If authentication fails.
venice_ai.exceptions.NotFoundError – If the API key ID does not exist.
venice_ai.exceptions.APIError – If the API returns an error.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

Example

# Retrieve a specific API key asynchronously
key_details = await client.api_keys.retrieve(api_key_id="key_123456789")
print(f"Key description: {key_details['description']}")
print(f"Created at: {key_details['createdAt']}")

class venice_ai.resources.api_keys.ApiKeys(client: venice_ai._resource.SyncClientT)[source]

Bases: APIResource

Provides access to API key management operations.

Parameters:: _client (VeniceClient) – The Venice client instance used for making API requests.

Example

from venice_ai import VeniceClient
from venice_ai.types.api_keys import ApiKeyCreateRequest

client = VeniceClient()

# List existing API keys
keys = client.api_keys.list(limit=10)
for key in keys:
    print(f"Key ID: {key.id}, Description: {key.description}")

# Create a new API key
create_request = ApiKeyCreateRequest(
    description="My Test Key",
    apiKeyType="INFERENCE"
)
new_key = client.api_keys.create(api_key_request=create_request)
print(f"Created key: {new_key.apiKey}")  # Only shown on creation

Parameters:: client (typing.TypeVar(SyncClientT, bound= VeniceClient))

create(*, api_key_request: venice_ai.types.api_keys.ApiKeyCreateRequest)[source]

Creates a new API key.

Creates a new API key with the specified parameters. The created API key will be returned only once in the response and cannot be retrieved later, so it should be securely stored immediately.

Parameters:

api_key_request (ApiKeyCreateRequest) –

Request object containing API key configuration. Must include at minimum a description and apiKeyType. The request can contain:

description (str): Human-readable description of the API key
apiKeyType (str): Type of API key (e.g., “INFERENCE”, “ADMIN”)
expiresAt (Optional[str]): ISO 8601 timestamp when key expires
consumptionLimit (Optional[int]): Maximum usage limit for the key

Returns:

Response containing the newly created API key details, including the secret key value (only returned once), key ID, creation timestamp, and other metadata.

Return type:

ApiKey

Raises:

venice_ai.exceptions.AuthenticationError – If authentication fails.
venice_ai.exceptions.APIError – If the API returns an error, such as when maximum API key limit is reached or invalid parameters are provided.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

Example

from venice_ai.types.api_keys import ApiKeyCreateRequest

# Create a basic API key
create_request = ApiKeyCreateRequest(
    description="My Test Key",
    apiKeyType="INFERENCE"
)
new_key = client.api_keys.create(api_key_request=create_request)
print(f"Created key ID: {new_key.id}")
print(f"API Key: {new_key.apiKey}")  # Store this securely!

# Create a key with expiration and limits
advanced_request = ApiKeyCreateRequest(
    description="Limited Production Key",
    apiKeyType="INFERENCE",
    expiresAt="2024-12-31T23:59:59Z",
    consumptionLimit=10000
)
limited_key = client.api_keys.create(api_key_request=advanced_request)

create_web3_key(*, web3_key_request: venice_ai.types.api_keys.ApiKeyGenerateWeb3KeyCreateRequest)[source]

Creates a new Web3 API key.

Creates a new API key authenticated via a Web3 signature.

Parameters:

web3_key_request (ApiKeyGenerateWeb3KeyCreateRequest) – Request body containing Web3 authentication details (such as web3_network_id, web3_address, and signature) and API key parameters.

Returns:

Response containing the newly created API key details.

Return type:

ApiKeyGenerateWeb3KeyCreateResponse

Raises:

venice_ai.exceptions.APIError – If the API returns an error.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

delete(*, api_key_id: str)[source]

Deletes an API key.

Permanently deletes the specified API key. Once deleted, the API key can no longer be used to authenticate requests and this action cannot be undone. Use with caution in production environments.

Parameters:

api_key_id (str) – Unique identifier of the API key to delete. This is the key’s ID (not the secret key value) as returned by the create or list operations.

Returns:

Response indicating the result of the deletion operation, typically containing a success flag and deletion confirmation message.

Return type:

Dict[str, Any]

Raises:

venice_ai.exceptions.AuthenticationError – If authentication fails.
venice_ai.exceptions.APIError – If the API returns an error, such as when the API key ID does not exist or belongs to another account.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

Example

# Delete an API key (use with caution)
result = client.api_keys.delete(api_key_id="key_123456789")
print(f"Deletion result: {result}")

# Safe deletion pattern
keys = client.api_keys.list()
test_keys = [k for k in keys if "test" in k.description.lower()]
for test_key in test_keys:
    client.api_keys.delete(api_key_id=test_key.id)
    print(f"Deleted test key: {test_key.id}")

Retrieves rate limit logs for API keys.

Parameters:

api_key_id (Optional[str]) – Specific API key ID to get logs for. If not provided, returns logs for the current API key.
start_date (Optional[str]) – Start date for log retrieval in ISO 8601 format (e.g., “2024-01-01T00:00:00Z”). If not provided, uses a default lookback period.
end_date (Optional[str]) – End date for log retrieval in ISO 8601 format (e.g., “2024-01-31T23:59:59Z”). If not provided, uses current time.
limit (Optional[int]) – Maximum number of log entries to return per page.
page (Optional[int]) – Page number for pagination (1-based indexing).

Returns:

A list of rate limit log entries with timestamps, event types, and related metadata.

Return type:

RateLimitLogList

Raises:

venice_ai.exceptions.AuthenticationError – If authentication fails.
venice_ai.exceptions.APIError – If the API returns an error.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

Example

# Get recent rate limit logs
logs = client.api_keys.get_rate_limit_logs(limit=10)
for log_entry in logs:
    print(f"Event: {log_entry.event_type} at {log_entry.timestamp}")

# Get logs for a specific date range
logs = client.api_keys.get_rate_limit_logs(
    start_date="2024-01-01T00:00:00Z",
    end_date="2024-01-31T23:59:59Z",
    limit=50
)

get_rate_limits()[source]

Retrieves rate limit information for the current API key.

Returns information about the rate limits applied to the current API key, including the limits per minute, hour, day, and month, as well as the current usage against those limits.

Returns:

Rate limit information, including limits and current usage.

Return type:

RateLimitInfo

Raises:

venice_ai.exceptions.AuthenticationError – If authentication fails.
venice_ai.exceptions.APIError – If the API returns an error.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

get_web3_token()[source]

Retrieves a token for Web3 API key generation.

This token is required for the subsequent POST request to create a Web3 API key.

Returns:

Response containing the token required for Web3 key generation.

Return type:

ApiKeyGenerateWeb3KeyGetResponse

Raises:

venice_ai.exceptions.APIError – If the API returns an error.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

list(*, page: int | None = None, limit: int | None = None)[source]

Lists API keys for the authenticated account, with optional pagination.

Retrieves a list of API keys associated with the current account. This includes active and inactive API keys. Supports pagination for managing large numbers of API keys.

Parameters:

page (Optional[int]) – Page number to retrieve (1-based indexing). If not provided, returns the first page.
limit (Optional[int]) – Maximum number of API keys to return per page. If not provided, uses the server’s default limit.

Returns:

A list of API key objects containing metadata such as ID, description, creation date, and status. Note that the actual API key values are not included in the response for security reasons.

Return type:

List[ApiKey]

Raises:

venice_ai.exceptions.AuthenticationError – If authentication fails.
venice_ai.exceptions.APIError – If the API returns an error.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

Example

# List all API keys
all_keys = client.api_keys.list()

# List with pagination
page_keys = client.api_keys.list(page=1, limit=5)
for key in page_keys:
    print(f"Key ID: {key.id}, Description: {key.description}")

class venice_ai.resources.api_keys.AsyncApiKeys(client: venice_ai._resource.AsyncClientT)[source]

Bases: AsyncAPIResource

Provides access to API key management operations asynchronously.

Parameters:: _client (AsyncVeniceClient) – The AsyncVeniceClient instance used for making asynchronous API requests.

Example

import asyncio
from venice_ai import AsyncVeniceClient
from venice_ai.types.api_keys import ApiKeyCreateRequest

async def manage_api_keys():
    client = AsyncVeniceClient()

    # List existing API keys
    keys = await client.api_keys.list(limit=10)
    for key in keys:
        print(f"Key ID: {key.id}, Description: {key.description}")

    # Create a new API key
    create_request = ApiKeyCreateRequest(
        description="My Async Test Key",
        apiKeyType="INFERENCE"
    )
    new_key = await client.api_keys.create(api_key_request=create_request)
    print(f"Created key: {new_key.apiKey}")  # Only shown on creation

asyncio.run(manage_api_keys())

Parameters:: client (typing.TypeVar(AsyncClientT, bound= AsyncVeniceClient))

async create(*, api_key_request: venice_ai.types.api_keys.ApiKeyCreateRequest)[source]

Creates a new API key asynchronously.

Creates a new API key with the specified parameters. The created API key will be returned only once in the response and cannot be retrieved later, so it should be securely stored immediately.

Parameters:

api_key_request (ApiKeyCreateRequest) –

Request object containing API key configuration. Must include at minimum a description and apiKeyType. The request can contain:

description (str): Human-readable description of the API key
apiKeyType (str): Type of API key (e.g., “INFERENCE”, “ADMIN”)
expiresAt (Optional[str]): ISO 8601 timestamp when key expires
consumptionLimit (Optional[ConsumptionLimit]): Usage limits for the key

Returns:

Response containing the newly created API key details, including the secret key value (only returned once), key ID, creation timestamp, and other metadata.

Return type:

ApiKey

Raises:

venice_ai.exceptions.AuthenticationError – If authentication fails.
venice_ai.exceptions.APIError – If the API returns an error, such as when maximum API key limit is reached or invalid parameters are provided.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

Example

from venice_ai.types.api_keys import ApiKeyCreateRequest

# Create a basic API key asynchronously
create_request = ApiKeyCreateRequest(
    description="My Async Test Key",
    apiKeyType="INFERENCE"
)
new_key = await client.api_keys.create(api_key_request=create_request)
print(f"Created key ID: {new_key.id}")
print(f"API Key: {new_key.apiKey}")  # Store this securely!

async create_web3_key(*, web3_key_request: venice_ai.types.api_keys.ApiKeyGenerateWeb3KeyCreateRequest)[source]

Creates a new Web3 API key asynchronously.

Creates a new API key authenticated via a Web3 signature.

Parameters:

web3_key_request (ApiKeyGenerateWeb3KeyCreateRequest) – Request body containing Web3 authentication details (such as web3_network_id, web3_address, and signature) and API key parameters.

Returns:

Response containing the newly created API key details.

Return type:

ApiKeyGenerateWeb3KeyCreateResponse

Raises:

venice_ai.exceptions.APIError – If the API returns an error.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

async delete(*, api_key_id: str)[source]

Deletes an API key asynchronously.

Permanently deletes the specified API key. Once deleted, the API key can no longer be used to authenticate requests and this action cannot be undone.

Parameters:

api_key_id (str) – ID of the API key to delete. This is the key’s unique identifier, not the secret key value.

Returns:

Response indicating the result of the operation, typically containing a success flag and deletion confirmation.

Return type:

Dict[str, Any]

Raises:

venice_ai.exceptions.AuthenticationError – If authentication fails.
venice_ai.exceptions.APIError – If the API returns an error, such as when the API key ID does not exist or belongs to another account.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

Example

# Delete an API key asynchronously (use with caution)
result = await client.api_keys.delete(api_key_id="key_123456789")
print(f"Deletion result: {result}")

# Safe deletion pattern
keys = await client.api_keys.list()
test_keys = [k for k in keys if "test" in k.description.lower()]
for test_key in test_keys:
    await client.api_keys.delete(api_key_id=test_key.id)
    print(f"Deleted test key: {test_key.id}")

Retrieves rate limit logs for API keys asynchronously.

Parameters:

api_key_id (Optional[str]) – Specific API key ID to get logs for. If not provided, returns logs for the current API key.
start_date (Optional[str]) – Start date for log retrieval in ISO 8601 format (e.g., “2024-01-01T00:00:00Z”). If not provided, uses a default lookback period.
end_date (Optional[str]) – End date for log retrieval in ISO 8601 format (e.g., “2024-01-31T23:59:59Z”). If not provided, uses current time.
limit (Optional[int]) – Maximum number of log entries to return per page.
page (Optional[int]) – Page number for pagination (1-based indexing).

Returns:

A list of rate limit log entries with timestamps, event types, and related metadata.

Return type:

RateLimitLogList

Raises:

venice_ai.exceptions.AuthenticationError – If authentication fails.
venice_ai.exceptions.APIError – If the API returns an error.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

Example

# Get recent rate limit logs asynchronously
logs = await client.api_keys.get_rate_limit_logs(limit=10)
for log_entry in logs:
    print(f"Event: {log_entry.event_type} at {log_entry.timestamp}")

# Get logs for a specific date range
logs = await client.api_keys.get_rate_limit_logs(
    start_date="2024-01-01T00:00:00Z",
    end_date="2024-01-31T23:59:59Z",
    limit=50
)

async get_rate_limits()[source]

Retrieves rate limit information for the current API key asynchronously.

Returns information about the rate limits applied to the current API key, including the limits per minute, hour, day, and month, as well as the current usage against those limits.

Returns:

Rate limit information, including limits and current usage.

Return type:

RateLimitInfo

Raises:

venice_ai.exceptions.AuthenticationError – If authentication fails.
venice_ai.exceptions.APIError – If the API returns an error.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

async get_web3_token()[source]

Retrieves a token for Web3 API key generation asynchronously.

This token is required for the subsequent POST request to create a Web3 API key.

Returns:

Response containing the token required for Web3 key generation.

Return type:

ApiKeyGenerateWeb3KeyGetResponse

Raises:

venice_ai.exceptions.APIError – If the API returns an error.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

async list(*, page: int | None = None, limit: int | None = None)[source]

Lists API keys for the authenticated account asynchronously, with optional pagination.

Retrieves a list of API keys associated with the current account. This includes active and inactive API keys. Supports pagination.

Parameters:

page (Optional[int]) – Page number to retrieve (1-based indexing). If not provided, returns the first page.
limit (Optional[int]) – Maximum number of API keys to return per page. If not provided, uses the server’s default limit.

Returns:

A list of API key objects containing metadata such as ID, description, creation date, and status. Note that the actual API key values are not included in the response for security reasons.

Return type:

List[ApiKey]

Raises:

venice_ai.exceptions.AuthenticationError – If authentication fails.
venice_ai.exceptions.APIError – If the API returns an error.
venice_ai.exceptions.APIConnectionError – If there’s an issue connecting to the API.

Example

# List all API keys asynchronously
all_keys = await client.api_keys.list()

# List with pagination
page_keys = await client.api_keys.list(page=1, limit=5)
for key in page_keys:
    print(f"Key ID: {key.id}, Description: {key.description}")

Audio Resources¶

Venice AI Audio API resources.

This module provides classes for interacting with the Venice AI Audio API, supporting speech synthesis operations. The module includes both synchronous and asynchronous interfaces for audio generation with various voice options and output formats.

The audio API allows for: - Converting text to natural-sounding speech (text-to-speech) - Selecting from multiple voice options for speech synthesis - Controlling speech speed and output format - Both full and streaming response modes

class venice_ai.resources.audio.AsyncAudio(client: venice_ai._resource.AsyncClientT)[source]

Provides access to text-to-speech (TTS) audio generation operations asynchronously.

This class handles asynchronous audio generation requests, supporting both streaming and non-streaming modes. It allows conversion of text to natural-sounding speech using various voice models and output formats in async applications.

Parameters:: client (AsyncVeniceClient) – The async Venice AI client instance used for making API requests.

Note

This class is typically accessed through the AsyncVeniceClient.audio property rather than being instantiated directly.

Generates audio from input text asynchronously.

Converts the provided text to speech using the specified model and voice using asynchronous requests. The audio can be returned either as complete binary data or as an async stream of audio chunks for real-time processing.

Parameters:

model (str) – ID of the model to use for speech generation (e.g., “tts-kokoro”).
input (str) – The text to convert to speech. Maximum length varies by model.
voice (Union[str, venice_ai.types.audio.Voice]) – The voice to use for the generated audio. Can be a string literal or a Voice enum value (e.g., Voice.KOKORO_DEFAULT or “kokoro-default”).
response_format (Optional[Union[str, venice_ai.types.audio.ResponseFormat]]) – The format to return the audio in. Can be a string literal or a ResponseFormat enum value. Defaults to “mp3”.
speed (Optional[float]) – The speed of the generated audio. Select a value from 0.25 to 4.0. Defaults to 1.0.
stream (Optional[bool]) – Whether to stream the audio data. If True, returns an AsyncIterator of audio chunks. If False, returns the complete audio data. Defaults to False.
timeout (Optional[Union[float, httpx.Timeout]]) – Request timeout in seconds or an httpx.Timeout object. If not provided, uses the client’s default timeout.

Returns:

If stream is False, returns the audio data as bytes (awaitable). If stream is True, returns an AsyncIterator yielding chunks of audio data as bytes.

Return type:

Union[bytes, AsyncIterator[bytes]]

Raises:

venice_ai.exceptions.APIError – If the API request fails.
ValueError – If the input text is empty or invalid parameters are provided.

Example

Basic non-streaming text-to-speech:

import asyncio
from venice_ai import AsyncVeniceClient
from venice_ai.types.audio import Voice, ResponseFormat

async def generate_speech():
    client = AsyncVeniceClient()

    # Generate speech with enum values
    audio_bytes = await client.audio.create_speech(
        model="tts-kokoro",
        input="Hello, this is a test.",
        voice=Voice.KOKORO_DEFAULT
    )

    # Save to file
    with open("speech.mp3", "wb") as f:
        f.write(audio_bytes)

    # Using string literals and different format
    audio_bytes = await client.audio.create_speech(
        model="tts-kokoro",
        input="Hello with different settings.",
        voice="kokoro-default",
        response_format="wav",
        speed=1.2
    )

asyncio.run(generate_speech())

Streaming text-to-speech:

async def stream_speech():
    client = AsyncVeniceClient()

    # Stream audio data
    stream = client.audio.create_speech(
        model="tts-kokoro",
        input="This is a streamed audio example.",
        voice="kokoro-default",
        stream=True
    )

    # Write streamed chunks to file
    with open("streamed_speech.mp3", "wb") as f:
        async for chunk in stream:
            f.write(chunk)

asyncio.run(stream_speech())

async get_voices(*, model_id: str | None = None, gender: Literal['male', 'female', 'unknown'] | None = None, region_code: str | None = None)[source]

Lists available text-to-speech (TTS) voices asynchronously, with optional filtering.

This method retrieves information about available voices for TTS models, allowing filtering by model ID, gender, and region code.

Parameters:

model_id (typing.Optional[str]) – Optional. If provided, only voices for this specific TTS model ID will be returned.
gender (typing.Optional[typing.Literal['male', 'female', 'unknown']]) – Optional. Filter voices by gender (“male”, “female”, “unknown”). Gender is inferred from the voice ID prefix.
region_code (typing.Optional[str]) – Optional. Filter voices by the raw two-letter region/language prefix from the voice ID (e.g., “af” for American Female-sounding, “zm” for Chinese Male-sounding).

Return type:

venice_ai.types.audio.VoiceList

Returns:

A VoiceList object containing a list of VoiceDetail objects that match the filter criteria, along with information about the applied filters.

Raises:

venice_ai.exceptions.APIError – If an API error occurs during the request to the underlying models endpoint.

class venice_ai.resources.audio.Audio(client: venice_ai._resource.SyncClientT)[source]

Provides access to text-to-speech (TTS) audio generation operations.

This class handles synchronous audio generation requests, supporting both streaming and non-streaming modes. It allows conversion of text to natural-sounding speech using various voice models and output formats.

Parameters:: client (VeniceClient) – The Venice AI client instance used for making API requests.

Note

This class is typically accessed through the VeniceClient.audio property rather than being instantiated directly.

Generates audio from input text.

Converts the provided text to speech using the specified model and voice. The audio can be returned either as complete binary data or as a stream of audio chunks for real-time processing.

Parameters:

model (str) – ID of the model to use for speech generation (e.g., “tts-kokoro”).
input (str) – The text to convert to speech. Maximum length varies by model.
voice (Union[str, venice_ai.types.audio.Voice]) – The voice to use for the generated audio. Can be a string literal or a Voice enum value (e.g., Voice.KOKORO_DEFAULT or “kokoro-default”).
response_format (Optional[Union[str, venice_ai.types.audio.ResponseFormat]]) – The format to return the audio in. Can be a string literal or a ResponseFormat enum value. Defaults to “mp3”.
speed (Optional[float]) – The speed of the generated audio. Select a value from 0.25 to 4.0. Defaults to 1.0.
stream (Optional[bool]) – Whether to stream the audio data. If True, returns an Iterator of audio chunks. If False, returns the complete audio data. Defaults to False.
timeout (Optional[Union[float, httpx.Timeout]]) – Request timeout in seconds or an httpx.Timeout object. If not provided, uses the client’s default timeout.

Returns:

If stream is False, returns the audio data as bytes. If stream is True, returns an Iterator yielding chunks of audio data as bytes.

Return type:

Union[bytes, Iterator[bytes]]

Raises:

venice_ai.exceptions.APIError – If the API request fails.
ValueError – If the input text is empty or invalid parameters are provided.

Example

Basic non-streaming text-to-speech:

from venice_ai import VeniceClient
from venice_ai.types.audio import Voice, ResponseFormat

client = VeniceClient()

# Generate speech with enum values
audio_bytes = client.audio.create_speech(
    model="tts-kokoro",
    input="Hello, this is a test.",
    voice=Voice.KOKORO_DEFAULT
)

# Save to file
with open("speech.mp3", "wb") as f:
    f.write(audio_bytes)

# Using string literals and different format
audio_bytes = client.audio.create_speech(
    model="tts-kokoro",
    input="Hello with different settings.",
    voice="kokoro-default",
    response_format="wav",
    speed=1.2
)

Streaming text-to-speech:

# Stream audio data
stream = client.audio.create_speech(
    model="tts-kokoro",
    input="This is a streamed audio example.",
    voice="kokoro-default",
    stream=True
)

# Write streamed chunks to file
with open("streamed_speech.mp3", "wb") as f:
    for chunk in stream:
        f.write(chunk)

get_voices(*, model_id: str | None = None, gender: Literal['male', 'female', 'unknown'] | None = None, region_code: str | None = None)[source]

Lists available text-to-speech (TTS) voices, with optional filtering.

This method retrieves information about available voices for TTS models, allowing filtering by model ID, gender, and region code.

Parameters:

model_id (typing.Optional[str]) – Optional. If provided, only voices for this specific TTS model ID will be returned.
gender (typing.Optional[typing.Literal['male', 'female', 'unknown']]) – Optional. Filter voices by gender (“male”, “female”, “unknown”). Gender is inferred from the voice ID prefix.
region_code (typing.Optional[str]) – Optional. Filter voices by the raw two-letter region/language prefix from the voice ID (e.g., “af” for American Female-sounding, “zm” for Chinese Male-sounding).

Return type:

venice_ai.types.audio.VoiceList

Returns:

A VoiceList object containing a list of VoiceDetail objects that match the filter criteria, along with information about the applied filters.

Raises:

venice_ai.exceptions.APIError – If an API error occurs during the request to the underlying models endpoint.

class venice_ai.resources.audio.Audio(client: venice_ai._resource.SyncClientT)[source]

Bases: APIResource

Provides access to text-to-speech (TTS) audio generation operations.

Parameters:: client (VeniceClient) – The Venice AI client instance used for making API requests.

Note

This class is typically accessed through the VeniceClient.audio property rather than being instantiated directly.

Generates audio from input text.

Converts the provided text to speech using the specified model and voice. The audio can be returned either as complete binary data or as a stream of audio chunks for real-time processing.

Parameters:

model (str) – ID of the model to use for speech generation (e.g., “tts-kokoro”).
input (str) – The text to convert to speech. Maximum length varies by model.
voice (Union[str, venice_ai.types.audio.Voice]) – The voice to use for the generated audio. Can be a string literal or a Voice enum value (e.g., Voice.KOKORO_DEFAULT or “kokoro-default”).
response_format (Optional[Union[str, venice_ai.types.audio.ResponseFormat]]) – The format to return the audio in. Can be a string literal or a ResponseFormat enum value. Defaults to “mp3”.
speed (Optional[float]) – The speed of the generated audio. Select a value from 0.25 to 4.0. Defaults to 1.0.
stream (Optional[bool]) – Whether to stream the audio data. If True, returns an Iterator of audio chunks. If False, returns the complete audio data. Defaults to False.
timeout (Optional[Union[float, httpx.Timeout]]) – Request timeout in seconds or an httpx.Timeout object. If not provided, uses the client’s default timeout.

Returns:

If stream is False, returns the audio data as bytes. If stream is True, returns an Iterator yielding chunks of audio data as bytes.

Return type:

Union[bytes, Iterator[bytes]]

Raises:

venice_ai.exceptions.APIError – If the API request fails.
ValueError – If the input text is empty or invalid parameters are provided.

Example

Basic non-streaming text-to-speech:

from venice_ai import VeniceClient
from venice_ai.types.audio import Voice, ResponseFormat

client = VeniceClient()

# Generate speech with enum values
audio_bytes = client.audio.create_speech(
    model="tts-kokoro",
    input="Hello, this is a test.",
    voice=Voice.KOKORO_DEFAULT
)

# Save to file
with open("speech.mp3", "wb") as f:
    f.write(audio_bytes)

# Using string literals and different format
audio_bytes = client.audio.create_speech(
    model="tts-kokoro",
    input="Hello with different settings.",
    voice="kokoro-default",
    response_format="wav",
    speed=1.2
)

Streaming text-to-speech:

# Stream audio data
stream = client.audio.create_speech(
    model="tts-kokoro",
    input="This is a streamed audio example.",
    voice="kokoro-default",
    stream=True
)

# Write streamed chunks to file
with open("streamed_speech.mp3", "wb") as f:
    for chunk in stream:
        f.write(chunk)

class venice_ai.resources.audio.AsyncAudio(client: venice_ai._resource.AsyncClientT)[source]

Bases: AsyncAPIResource

Provides access to text-to-speech (TTS) audio generation operations asynchronously.

Parameters:: client (AsyncVeniceClient) – The async Venice AI client instance used for making API requests.

Note

This class is typically accessed through the AsyncVeniceClient.audio property rather than being instantiated directly.

Generates audio from input text asynchronously.

Parameters:

model (str) – ID of the model to use for speech generation (e.g., “tts-kokoro”).
input (str) – The text to convert to speech. Maximum length varies by model.
voice (Union[str, venice_ai.types.audio.Voice]) – The voice to use for the generated audio. Can be a string literal or a Voice enum value (e.g., Voice.KOKORO_DEFAULT or “kokoro-default”).
response_format (Optional[Union[str, venice_ai.types.audio.ResponseFormat]]) – The format to return the audio in. Can be a string literal or a ResponseFormat enum value. Defaults to “mp3”.
speed (Optional[float]) – The speed of the generated audio. Select a value from 0.25 to 4.0. Defaults to 1.0.
stream (Optional[bool]) – Whether to stream the audio data. If True, returns an AsyncIterator of audio chunks. If False, returns the complete audio data. Defaults to False.
timeout (Optional[Union[float, httpx.Timeout]]) – Request timeout in seconds or an httpx.Timeout object. If not provided, uses the client’s default timeout.

Returns:

If stream is False, returns the audio data as bytes (awaitable). If stream is True, returns an AsyncIterator yielding chunks of audio data as bytes.

Return type:

Union[bytes, AsyncIterator[bytes]]

Raises:

venice_ai.exceptions.APIError – If the API request fails.
ValueError – If the input text is empty or invalid parameters are provided.

Example

Basic non-streaming text-to-speech:

import asyncio
from venice_ai import AsyncVeniceClient
from venice_ai.types.audio import Voice, ResponseFormat

async def generate_speech():
    client = AsyncVeniceClient()

    # Generate speech with enum values
    audio_bytes = await client.audio.create_speech(
        model="tts-kokoro",
        input="Hello, this is a test.",
        voice=Voice.KOKORO_DEFAULT
    )

    # Save to file
    with open("speech.mp3", "wb") as f:
        f.write(audio_bytes)

    # Using string literals and different format
    audio_bytes = await client.audio.create_speech(
        model="tts-kokoro",
        input="Hello with different settings.",
        voice="kokoro-default",
        response_format="wav",
        speed=1.2
    )

asyncio.run(generate_speech())

Streaming text-to-speech:

async def stream_speech():
    client = AsyncVeniceClient()

    # Stream audio data
    stream = client.audio.create_speech(
        model="tts-kokoro",
        input="This is a streamed audio example.",
        voice="kokoro-default",
        stream=True
    )

    # Write streamed chunks to file
    with open("streamed_speech.mp3", "wb") as f:
        async for chunk in stream:
            f.write(chunk)

asyncio.run(stream_speech())

Embeddings Resources¶

Venice AI Embeddings API resources.

This module provides classes for interacting with the Venice AI Embeddings API, allowing clients to generate embeddings from text or token inputs. These embeddings are vector representations of text that capture semantic meaning and can be used for tasks such as semantic search, clustering, and classification.

class venice_ai.resources.embeddings.AsyncEmbeddings(client: venice_ai._resource.AsyncClientT)[source]

Provides access to text embedding generation operations (asynchronous).

This class manages asynchronous embedding operations through the Venice AI API. It provides the same functionality as the synchronous Embeddings class but uses async/await patterns for non-blocking operations. Embeddings are vector representations of text that capture semantic meaning and can be used for various natural language processing tasks.

Parameters:: client (venice_ai._async_client.AsyncVeniceClient) – The async Venice AI client instance used to make API requests.

Generates embeddings for input text(s) asynchronously.

This method sends an asynchronous request to the Venice AI API to generate vector embeddings for the provided text or token inputs using the specified model. The embeddings can be used for semantic search, clustering, classification, and other NLP tasks.

Parameters:

model (str) – The ID of the embedding model to use. Available models can be retrieved using the models API. Example: 'text-embedding-bge-m3'.
input (Union[str, List[str], List[int], List[List[int]]]) – The input text(s) to generate embeddings for. Can be a single string, a list of strings for batch processing, a list of token integers, or a list of token lists. For batch processing, all inputs will be processed together in a single API call.
dimensions (Optional[int]) – The number of dimensions for the output embeddings. If not specified, uses the model’s default dimensionality. Some models support reducing dimensions for efficiency.
encoding_format (Optional[Literal["float", "base64"]]) – The format for the returned embeddings. Defaults to 'float' for numerical arrays. Use 'base64' for base64-encoded string representation.
user (Optional[str]) – A unique identifier representing your end-user. This parameter is supported for compatibility with OpenAI clients but is discarded by the Venice API and does not affect the response.

Returns:

A response object containing the generated embeddings and usage data. The response includes an array of embedding objects, each containing the vector representation and associated metadata.

Return type:

EmbeddingList

Raises:

venice_ai.exceptions.InvalidRequestError – If parameter values are invalid (e.g., empty model or input, unsupported encoding format).
venice_ai.exceptions.AuthenticationError – If the API key is invalid or missing.
venice_ai.exceptions.PermissionDeniedError – If access to the specified model is denied.
venice_ai.exceptions.NotFoundError – If the specified model is not found.
venice_ai.exceptions.RateLimitError – If rate limits are exceeded.
venice_ai.exceptions.APIError – For other API-related errors.

Examples:

Generate an embedding for a single string:

import asyncio
from venice_ai import AsyncVeniceClient

async def create_embedding():
    async with AsyncVeniceClient(api_key="your-api-key") as client:
        response = await client.embeddings.create(
            model="text-embedding-bge-m3",
            input="The quick brown fox jumps over the lazy dog."
        )
        embedding = response.data[0].embedding
        print(f"Embedding dimensions: {len(embedding)}")
        print(f"First 5 dimensions: {embedding[:5]}")

asyncio.run(create_embedding())

Generate embeddings for multiple strings (batch processing):

async def create_batch_embeddings():
    inputs = [
        "First sentence for embedding.",
        "Second sentence for embedding.",
        "Third sentence for embedding."
    ]
    async with AsyncVeniceClient(api_key="your-api-key") as client:
        batch_response = await client.embeddings.create(
            model="text-embedding-bge-m3",
            input=inputs
        )
        for i, data_item in enumerate(batch_response.data):
            print(f"Embedding for '{inputs[i]}' (first 3 dims): {data_item.embedding[:3]}")
        print(f"Total tokens used: {batch_response.usage.total_tokens}")

asyncio.run(create_batch_embeddings())

Using optional parameters:

async def create_custom_embedding():
    async with AsyncVeniceClient(api_key="your-api-key") as client:
        response = await client.embeddings.create(
            model="text-embedding-bge-m3",
            input="Sample text for embedding",
            dimensions=512,  # Reduce dimensions if supported
            encoding_format="base64",  # Get base64-encoded embeddings
            user="user-123"  # Track usage by user
        )

asyncio.run(create_custom_embedding())

class venice_ai.resources.embeddings.Embeddings(client: venice_ai._resource.SyncClientT)[source]

Provides access to text embedding generation operations.

This class manages synchronous embedding operations through the Venice AI API. Embeddings are vector representations of text that capture semantic meaning and can be used for various natural language processing tasks such as semantic search, clustering, classification, and similarity analysis.

Parameters:: client (venice_ai._client.VeniceClient) – The Venice AI client instance used to make API requests.

Generates embeddings for input text(s).

This method sends a request to the Venice AI API to generate vector embeddings for the provided text or token inputs using the specified model. The embeddings can be used for semantic search, clustering, classification, and other NLP tasks.

Parameters:

model (str) – The ID of the embedding model to use. Available models can be retrieved using the models API. Example: 'text-embedding-bge-m3'.
input (Union[str, List[str], List[int], List[List[int]]]) – The input text(s) to generate embeddings for. Can be a single string, a list of strings for batch processing, a list of token integers, or a list of token lists. For batch processing, all inputs will be processed together in a single API call.
dimensions (Optional[int]) – The number of dimensions for the output embeddings. If not specified, uses the model’s default dimensionality. Some models support reducing dimensions for efficiency.
encoding_format (Optional[Literal["float", "base64"]]) – The format for the returned embeddings. Defaults to 'float' for numerical arrays. Use 'base64' for base64-encoded string representation.
user (Optional[str]) – A unique identifier representing your end-user. This parameter is supported for compatibility with OpenAI clients but is discarded by the Venice API and does not affect the response.

Returns:

A response object containing the generated embeddings and usage data. The response includes an array of embedding objects, each containing the vector representation and associated metadata.

Return type:

EmbeddingList

Raises:

venice_ai.exceptions.InvalidRequestError – If parameter values are invalid (e.g., empty model or input, unsupported encoding format).
venice_ai.exceptions.AuthenticationError – If the API key is invalid or missing.
venice_ai.exceptions.PermissionDeniedError – If access to the specified model is denied.
venice_ai.exceptions.NotFoundError – If the specified model is not found.
venice_ai.exceptions.RateLimitError – If rate limits are exceeded.
venice_ai.exceptions.APIError – For other API-related errors.

Examples:

Generate an embedding for a single string:

from venice_ai import VeniceClient

client = VeniceClient(api_key="your-api-key")
response = client.embeddings.create(
    model="text-embedding-bge-m3",
    input="The quick brown fox jumps over the lazy dog."
)
embedding = response.data[0].embedding
print(f"Embedding dimensions: {len(embedding)}")
print(f"First 5 dimensions: {embedding[:5]}")

Generate embeddings for multiple strings (batch processing):

inputs = [
    "First sentence for embedding.",
    "Second sentence for embedding.",
    "Third sentence for embedding."
]
batch_response = client.embeddings.create(
    model="text-embedding-bge-m3",
    input=inputs
)
for i, data_item in enumerate(batch_response.data):
    print(f"Embedding for '{inputs[i]}' (first 3 dims): {data_item.embedding[:3]}")
print(f"Total tokens used: {batch_response.usage.total_tokens}")

Using optional parameters:

response = client.embeddings.create(
    model="text-embedding-bge-m3",
    input="Sample text for embedding",
    dimensions=512,  # Reduce dimensions if supported
    encoding_format="base64",  # Get base64-encoded embeddings
    user="user-123"  # Track usage by user
)

class venice_ai.resources.embeddings.Embeddings(client: venice_ai._resource.SyncClientT)[source]

Bases: APIResource

Provides access to text embedding generation operations.

Parameters:: client (venice_ai._client.VeniceClient) – The Venice AI client instance used to make API requests.

Generates embeddings for input text(s).

Parameters:

model (str) – The ID of the embedding model to use. Available models can be retrieved using the models API. Example: 'text-embedding-bge-m3'.
input (Union[str, List[str], List[int], List[List[int]]]) – The input text(s) to generate embeddings for. Can be a single string, a list of strings for batch processing, a list of token integers, or a list of token lists. For batch processing, all inputs will be processed together in a single API call.
dimensions (Optional[int]) – The number of dimensions for the output embeddings. If not specified, uses the model’s default dimensionality. Some models support reducing dimensions for efficiency.
encoding_format (Optional[Literal["float", "base64"]]) – The format for the returned embeddings. Defaults to 'float' for numerical arrays. Use 'base64' for base64-encoded string representation.
user (Optional[str]) – A unique identifier representing your end-user. This parameter is supported for compatibility with OpenAI clients but is discarded by the Venice API and does not affect the response.

Returns:

A response object containing the generated embeddings and usage data. The response includes an array of embedding objects, each containing the vector representation and associated metadata.

Return type:

EmbeddingList

Raises:

venice_ai.exceptions.InvalidRequestError – If parameter values are invalid (e.g., empty model or input, unsupported encoding format).
venice_ai.exceptions.AuthenticationError – If the API key is invalid or missing.
venice_ai.exceptions.PermissionDeniedError – If access to the specified model is denied.
venice_ai.exceptions.NotFoundError – If the specified model is not found.
venice_ai.exceptions.RateLimitError – If rate limits are exceeded.
venice_ai.exceptions.APIError – For other API-related errors.

Examples:

Generate an embedding for a single string:

from venice_ai import VeniceClient

client = VeniceClient(api_key="your-api-key")
response = client.embeddings.create(
    model="text-embedding-bge-m3",
    input="The quick brown fox jumps over the lazy dog."
)
embedding = response.data[0].embedding
print(f"Embedding dimensions: {len(embedding)}")
print(f"First 5 dimensions: {embedding[:5]}")

Generate embeddings for multiple strings (batch processing):

inputs = [
    "First sentence for embedding.",
    "Second sentence for embedding.",
    "Third sentence for embedding."
]
batch_response = client.embeddings.create(
    model="text-embedding-bge-m3",
    input=inputs
)
for i, data_item in enumerate(batch_response.data):
    print(f"Embedding for '{inputs[i]}' (first 3 dims): {data_item.embedding[:3]}")
print(f"Total tokens used: {batch_response.usage.total_tokens}")

Using optional parameters:

response = client.embeddings.create(
    model="text-embedding-bge-m3",
    input="Sample text for embedding",
    dimensions=512,  # Reduce dimensions if supported
    encoding_format="base64",  # Get base64-encoded embeddings
    user="user-123"  # Track usage by user
)

class venice_ai.resources.embeddings.AsyncEmbeddings(client: venice_ai._resource.AsyncClientT)[source]

Bases: AsyncAPIResource

Provides access to text embedding generation operations (asynchronous).

Parameters:: client (venice_ai._async_client.AsyncVeniceClient) – The async Venice AI client instance used to make API requests.

Generates embeddings for input text(s) asynchronously.

Parameters:

model (str) – The ID of the embedding model to use. Available models can be retrieved using the models API. Example: 'text-embedding-bge-m3'.
input (Union[str, List[str], List[int], List[List[int]]]) – The input text(s) to generate embeddings for. Can be a single string, a list of strings for batch processing, a list of token integers, or a list of token lists. For batch processing, all inputs will be processed together in a single API call.
dimensions (Optional[int]) – The number of dimensions for the output embeddings. If not specified, uses the model’s default dimensionality. Some models support reducing dimensions for efficiency.
encoding_format (Optional[Literal["float", "base64"]]) – The format for the returned embeddings. Defaults to 'float' for numerical arrays. Use 'base64' for base64-encoded string representation.
user (Optional[str]) – A unique identifier representing your end-user. This parameter is supported for compatibility with OpenAI clients but is discarded by the Venice API and does not affect the response.

Returns:

A response object containing the generated embeddings and usage data. The response includes an array of embedding objects, each containing the vector representation and associated metadata.

Return type:

EmbeddingList

Raises:

venice_ai.exceptions.InvalidRequestError – If parameter values are invalid (e.g., empty model or input, unsupported encoding format).
venice_ai.exceptions.AuthenticationError – If the API key is invalid or missing.
venice_ai.exceptions.PermissionDeniedError – If access to the specified model is denied.
venice_ai.exceptions.NotFoundError – If the specified model is not found.
venice_ai.exceptions.RateLimitError – If rate limits are exceeded.
venice_ai.exceptions.APIError – For other API-related errors.

Examples:

Generate an embedding for a single string:

import asyncio
from venice_ai import AsyncVeniceClient

async def create_embedding():
    async with AsyncVeniceClient(api_key="your-api-key") as client:
        response = await client.embeddings.create(
            model="text-embedding-bge-m3",
            input="The quick brown fox jumps over the lazy dog."
        )
        embedding = response.data[0].embedding
        print(f"Embedding dimensions: {len(embedding)}")
        print(f"First 5 dimensions: {embedding[:5]}")

asyncio.run(create_embedding())

Generate embeddings for multiple strings (batch processing):

async def create_batch_embeddings():
    inputs = [
        "First sentence for embedding.",
        "Second sentence for embedding.",
        "Third sentence for embedding."
    ]
    async with AsyncVeniceClient(api_key="your-api-key") as client:
        batch_response = await client.embeddings.create(
            model="text-embedding-bge-m3",
            input=inputs
        )
        for i, data_item in enumerate(batch_response.data):
            print(f"Embedding for '{inputs[i]}' (first 3 dims): {data_item.embedding[:3]}")
        print(f"Total tokens used: {batch_response.usage.total_tokens}")

asyncio.run(create_batch_embeddings())

Using optional parameters:

async def create_custom_embedding():
    async with AsyncVeniceClient(api_key="your-api-key") as client:
        response = await client.embeddings.create(
            model="text-embedding-bge-m3",
            input="Sample text for embedding",
            dimensions=512,  # Reduce dimensions if supported
            encoding_format="base64",  # Get base64-encoded embeddings
            user="user-123"  # Track usage by user
        )

asyncio.run(create_custom_embedding())

Billing Resources¶

Venice AI Billing API resources.

This module provides classes for interacting with the Venice AI Billing API, allowing clients to retrieve usage information in different formats (JSON or CSV). The module offers both synchronous and asynchronous interfaces for retrieving billing data, designed to integrate smoothly with the respective client types.

class venice_ai.resources.billing.AsyncBilling(client: venice_ai._resource.AsyncClientT)[source]

Provides access to billing and usage data operations using asynchronous requests.

Manages asynchronous billing operations, providing methods to retrieve billing usage data in either JSON or CSV format using asynchronous requests. It’s designed to work with AsyncVeniceClient, allowing for non-blocking API calls in asynchronous applications. The class handles request formatting, response parsing, and proper type conversions based on the requested format.

Parameters:: client (venice_ai.AsyncVeniceClient) – The asynchronous Venice AI client instance used for making API requests.

Retrieves billing usage information asynchronously.

Fetches usage data from the Venice AI Billing API with various filtering options, using asynchronous HTTP requests. The method sets the appropriate ‘Accept’ header ('application/json' or 'text/csv') based on the requested format, which determines how the API processes and returns the data.

Parameters:

format (venice_ai.types.billing.BillingFormatEnum) – Response format (JSON or CSV). Defaults to JSON.
currency (Optional[str]) – Optional currency filter (USD or VCU).
startDate (Optional[str]) – Optional start date (ISO 8601 format, e.g., "2025-01-01T00:00:00Z").
endDate (Optional[str]) – Optional end date (ISO 8601 format, e.g., "2025-05-01T00:00:00Z").
limit (Optional[int]) – Optional number of items per page (1-500, default 200).
page (Optional[int]) – Optional page number for pagination (default 1).
sortOrder (Optional[str]) – Optional sort order for timestamp (asc/desc, default 'desc').

Returns:

Billing usage data as BillingUsageResponse for JSON, or bytes for CSV.

Return type:

Union[venice_ai.types.billing.BillingUsageResponse, bytes]

Raises:

venice_ai.exceptions.InvalidRequestError – If parameter values are invalid.
venice_ai.exceptions.AuthenticationError – If the API key is invalid.
venice_ai.exceptions.PermissionDeniedError – If access is denied.
venice_ai.exceptions.RateLimitError – If rate limits are exceeded.
venice_ai.exceptions.APIError – For other API-related errors.

Example

import asyncio
from venice_ai import AsyncVeniceClient
from venice_ai.types.billing import BillingFormatEnum

async def get_usage_example():
    async with AsyncVeniceClient(api_key="your-api-key") as client:
        # Get JSON usage data
        usage_response = await client.billing.get_usage(
            startDate="2025-01-01T00:00:00Z",
            endDate="2025-05-01T00:00:00Z",
            limit=10,
            page=1
        )

        # Access usage records
        for usage_record in usage_response['data']:
            print(f"Date: {usage_record['timestamp']}, Cost: {usage_record['amount']}")

        # Get CSV usage data
        usage_csv = await client.billing.get_usage(
            format=BillingFormatEnum.CSV,
            startDate="2025-01-01T00:00:00Z",
            endDate="2025-05-01T00:00:00Z"
        )

        # Write CSV to file
        with open("usage.csv", "wb") as f:
            f.write(usage_csv)

asyncio.run(get_usage_example())

class venice_ai.resources.billing.Billing(client: venice_ai._resource.SyncClientT)[source]

Provides access to billing and usage data operations.

Manages synchronous billing operations, providing methods to retrieve billing usage data in either JSON or CSV format. It handles API requests to the Venice AI Billing API endpoints, managing request parameters, headers, and response formats. When initialized with a VeniceClient instance, it inherits the client’s configuration including API key authentication.

Parameters:: client (venice_ai.VeniceClient) – The Venice AI client instance used for making API requests.

Retrieves billing usage information.

Fetches usage data from the Venice AI Billing API with various filtering options. The response format is determined by the ‘format’ parameter and corresponding ‘Accept’ header: ‘application/json’ for JSON format or ‘text/csv’ for CSV format.

Parameters:

format (venice_ai.types.billing.BillingFormatEnum) – Response format (JSON or CSV). Defaults to JSON.
currency (Optional[str]) – Optional currency filter (USD or VCU).
startDate (Optional[str]) – Optional start date (ISO 8601 format, e.g., "2025-01-01T00:00:00Z").
endDate (Optional[str]) – Optional end date (ISO 8601 format, e.g., "2025-05-01T00:00:00Z").
limit (Optional[int]) – Optional number of items per page (1-500, default 200).
page (Optional[int]) – Optional page number for pagination (default 1).
sortOrder (Optional[str]) – Optional sort order for timestamp (asc/desc, default 'desc').

Returns:

Billing usage data as BillingUsageResponse for JSON, or bytes for CSV.

Return type:

Union[venice_ai.types.billing.BillingUsageResponse, bytes]

Raises:

venice_ai.exceptions.InvalidRequestError – If parameter values are invalid.
venice_ai.exceptions.AuthenticationError – If the API key is invalid.
venice_ai.exceptions.PermissionDeniedError – If access is denied.
venice_ai.exceptions.RateLimitError – If rate limits are exceeded.
venice_ai.exceptions.APIError – For other API-related errors.

Example

from venice_ai import VeniceClient
from venice_ai.types.billing import BillingFormatEnum

client = VeniceClient(api_key="your-api-key")

# Get JSON usage data
usage_response = client.billing.get_usage(
    startDate="2025-01-01T00:00:00Z",
    endDate="2025-05-01T00:00:00Z",
    limit=10,
    page=1
)

# Access usage records
for usage_record in usage_response['data']:
    print(f"Date: {usage_record['timestamp']}, Cost: {usage_record['amount']}")

# Get CSV usage data
usage_csv = client.billing.get_usage(
    format=BillingFormatEnum.CSV,
    startDate="2025-01-01T00:00:00Z",
    endDate="2025-05-01T00:00:00Z"
)

# Write CSV to file
with open("usage.csv", "wb") as f:
    f.write(usage_csv)

class venice_ai.resources.billing.Billing(client: venice_ai._resource.SyncClientT)[source]

Bases: APIResource

Provides access to billing and usage data operations.

Parameters:: client (venice_ai.VeniceClient) – The Venice AI client instance used for making API requests.

Retrieves billing usage information.

Parameters:

format (venice_ai.types.billing.BillingFormatEnum) – Response format (JSON or CSV). Defaults to JSON.
currency (Optional[str]) – Optional currency filter (USD or VCU).
startDate (Optional[str]) – Optional start date (ISO 8601 format, e.g., "2025-01-01T00:00:00Z").
endDate (Optional[str]) – Optional end date (ISO 8601 format, e.g., "2025-05-01T00:00:00Z").
limit (Optional[int]) – Optional number of items per page (1-500, default 200).
page (Optional[int]) – Optional page number for pagination (default 1).
sortOrder (Optional[str]) – Optional sort order for timestamp (asc/desc, default 'desc').

Returns:

Billing usage data as BillingUsageResponse for JSON, or bytes for CSV.

Return type:

Union[venice_ai.types.billing.BillingUsageResponse, bytes]

Raises:

venice_ai.exceptions.InvalidRequestError – If parameter values are invalid.
venice_ai.exceptions.AuthenticationError – If the API key is invalid.
venice_ai.exceptions.PermissionDeniedError – If access is denied.
venice_ai.exceptions.RateLimitError – If rate limits are exceeded.
venice_ai.exceptions.APIError – For other API-related errors.

Example

from venice_ai import VeniceClient
from venice_ai.types.billing import BillingFormatEnum

client = VeniceClient(api_key="your-api-key")

# Get JSON usage data
usage_response = client.billing.get_usage(
    startDate="2025-01-01T00:00:00Z",
    endDate="2025-05-01T00:00:00Z",
    limit=10,
    page=1
)

# Access usage records
for usage_record in usage_response['data']:
    print(f"Date: {usage_record['timestamp']}, Cost: {usage_record['amount']}")

# Get CSV usage data
usage_csv = client.billing.get_usage(
    format=BillingFormatEnum.CSV,
    startDate="2025-01-01T00:00:00Z",
    endDate="2025-05-01T00:00:00Z"
)

# Write CSV to file
with open("usage.csv", "wb") as f:
    f.write(usage_csv)

class venice_ai.resources.billing.AsyncBilling(client: venice_ai._resource.AsyncClientT)[source]

Bases: AsyncAPIResource

Provides access to billing and usage data operations using asynchronous requests.

Parameters:: client (venice_ai.AsyncVeniceClient) – The asynchronous Venice AI client instance used for making API requests.

Retrieves billing usage information asynchronously.

Parameters:

format (venice_ai.types.billing.BillingFormatEnum) – Response format (JSON or CSV). Defaults to JSON.
currency (Optional[str]) – Optional currency filter (USD or VCU).
startDate (Optional[str]) – Optional start date (ISO 8601 format, e.g., "2025-01-01T00:00:00Z").
endDate (Optional[str]) – Optional end date (ISO 8601 format, e.g., "2025-05-01T00:00:00Z").
limit (Optional[int]) – Optional number of items per page (1-500, default 200).
page (Optional[int]) – Optional page number for pagination (default 1).
sortOrder (Optional[str]) – Optional sort order for timestamp (asc/desc, default 'desc').

Returns:

Billing usage data as BillingUsageResponse for JSON, or bytes for CSV.

Return type:

Union[venice_ai.types.billing.BillingUsageResponse, bytes]

Raises:

venice_ai.exceptions.InvalidRequestError – If parameter values are invalid.
venice_ai.exceptions.AuthenticationError – If the API key is invalid.
venice_ai.exceptions.PermissionDeniedError – If access is denied.
venice_ai.exceptions.RateLimitError – If rate limits are exceeded.
venice_ai.exceptions.APIError – For other API-related errors.

Example

import asyncio
from venice_ai import AsyncVeniceClient
from venice_ai.types.billing import BillingFormatEnum

async def get_usage_example():
    async with AsyncVeniceClient(api_key="your-api-key") as client:
        # Get JSON usage data
        usage_response = await client.billing.get_usage(
            startDate="2025-01-01T00:00:00Z",
            endDate="2025-05-01T00:00:00Z",
            limit=10,
            page=1
        )

        # Access usage records
        for usage_record in usage_response['data']:
            print(f"Date: {usage_record['timestamp']}, Cost: {usage_record['amount']}")

        # Get CSV usage data
        usage_csv = await client.billing.get_usage(
            format=BillingFormatEnum.CSV,
            startDate="2025-01-01T00:00:00Z",
            endDate="2025-05-01T00:00:00Z"
        )

        # Write CSV to file
        with open("usage.csv", "wb") as f:
            f.write(usage_csv)

asyncio.run(get_usage_example())

Characters Resources¶

class venice_ai.resources.characters.AsyncCharacters(client: venice_ai._resource.AsyncClientT)[source]

Provides methods for managing AI character definitions asynchronously.

Provides asynchronous methods to list available characters. This class mirrors the functionality of the synchronous Characters resource but operates in an asynchronous context.

Parameters:: client (venice_ai._async_client.AsyncVeniceClient) – The async Venice AI client instance used for API requests.

Warning

The Characters API is currently in Preview and may change in future releases.

async list(*, extra_headers: httpx.Headers | None = None, extra_query: Dict[str, Any] | None = None, extra_body: Dict[str, Any] | None = None, timeout: float | None = None)[source]

List all characters asynchronously.

Retrieves a list of all characters usable with the Venice AI API asynchronously. Each character includes details such as ID, name, and description.

Parameters:

extra_headers (Optional[httpx.Headers]) – Additional HTTP headers to include in the request.
extra_query (Optional[Dict[str, Any]]) – Additional query parameters to include in the request.
extra_body (Optional[Dict[str, Any]]) – Additional body parameters to include in the request.
timeout (Optional[float]) – Request timeout in seconds.

Returns:

A list of available characters.

Return type:

CharacterList

Raises:

venice_ai.exceptions.APIError – If the API request fails.

Example

import asyncio
from venice_ai import AsyncVeniceClient

async def main():
    client = AsyncVeniceClient(api_key="your-api-key")
    characters_response = await client.characters.list()
    for character in characters_response.data:
        print(f"Character ID: {character.slug}, Name: {character.name}")
    await client.close()

asyncio.run(main())

class venice_ai.resources.characters.Characters(client: venice_ai._resource.SyncClientT)[source]

Provides methods for managing AI character definitions.

Characters represent pre-defined personalities or specialized AI assistants that can be referenced in chat completions requests. This resource provides methods to list available characters.

Parameters:: client (venice_ai._client.VeniceClient) – The Venice AI client instance used for API requests.

Warning

The Characters API is currently in Preview and may change in future releases.

list(*, extra_headers: httpx.Headers | None = None, extra_query: Dict[str, Any] | None = None, extra_body: Dict[str, Any] | None = None, timeout: float | None = None)[source]

List all characters.

Retrieves a list of all characters usable with the Venice AI API. Each character includes details such as ID, name, and description.

Parameters:

extra_headers (Optional[httpx.Headers]) – Additional HTTP headers to include in the request.
extra_query (Optional[Dict[str, Any]]) – Additional query parameters to include in the request.
extra_body (Optional[Dict[str, Any]]) – Additional body parameters to include in the request.
timeout (Optional[float]) – Request timeout in seconds.

Returns:

A list of available characters.

Return type:

CharacterList

Raises:

venice_ai.exceptions.APIError – If the API request fails.

Example

from venice_ai import VeniceClient

client = VeniceClient(api_key="your-api-key")
characters_response = client.characters.list()
for character in characters_response.data:
    print(f"Character ID: {character.slug}, Name: {character.name}")

class venice_ai.resources.characters.Characters(client: venice_ai._resource.SyncClientT)[source]

Bases: APIResource

Provides methods for managing AI character definitions.

Characters represent pre-defined personalities or specialized AI assistants that can be referenced in chat completions requests. This resource provides methods to list available characters.

Parameters:: client (venice_ai._client.VeniceClient) – The Venice AI client instance used for API requests.

Warning

The Characters API is currently in Preview and may change in future releases.

list(*, extra_headers: httpx.Headers | None = None, extra_query: Dict[str, Any] | None = None, extra_body: Dict[str, Any] | None = None, timeout: float | None = None)[source]

List all characters.

Retrieves a list of all characters usable with the Venice AI API. Each character includes details such as ID, name, and description.

Parameters:

extra_headers (Optional[httpx.Headers]) – Additional HTTP headers to include in the request.
extra_query (Optional[Dict[str, Any]]) – Additional query parameters to include in the request.
extra_body (Optional[Dict[str, Any]]) – Additional body parameters to include in the request.
timeout (Optional[float]) – Request timeout in seconds.

Returns:

A list of available characters.

Return type:

CharacterList

Raises:

venice_ai.exceptions.APIError – If the API request fails.

Example

from venice_ai import VeniceClient

client = VeniceClient(api_key="your-api-key")
characters_response = client.characters.list()
for character in characters_response.data:
    print(f"Character ID: {character.slug}, Name: {character.name}")

class venice_ai.resources.characters.AsyncCharacters(client: venice_ai._resource.AsyncClientT)[source]

Bases: AsyncAPIResource

Provides methods for managing AI character definitions asynchronously.

Provides asynchronous methods to list available characters. This class mirrors the functionality of the synchronous Characters resource but operates in an asynchronous context.

Parameters:: client (venice_ai._async_client.AsyncVeniceClient) – The async Venice AI client instance used for API requests.

Warning

The Characters API is currently in Preview and may change in future releases.

async list(*, extra_headers: httpx.Headers | None = None, extra_query: Dict[str, Any] | None = None, extra_body: Dict[str, Any] | None = None, timeout: float | None = None)[source]

List all characters asynchronously.

Retrieves a list of all characters usable with the Venice AI API asynchronously. Each character includes details such as ID, name, and description.

Parameters:

extra_headers (Optional[httpx.Headers]) – Additional HTTP headers to include in the request.
extra_query (Optional[Dict[str, Any]]) – Additional query parameters to include in the request.
extra_body (Optional[Dict[str, Any]]) – Additional body parameters to include in the request.
timeout (Optional[float]) – Request timeout in seconds.

Returns:

A list of available characters.

Return type:

CharacterList

Raises:

venice_ai.exceptions.APIError – If the API request fails.

Example

import asyncio
from venice_ai import AsyncVeniceClient

async def main():
    client = AsyncVeniceClient(api_key="your-api-key")
    characters_response = await client.characters.list()
    for character in characters_response.data:
        print(f"Character ID: {character.slug}, Name: {character.name}")
    await client.close()

asyncio.run(main())

Type Definitions¶

Type definitions for Venice AI Chat Completions API.

This module contains Pydantic models for response objects and TypedDict definitions for request objects in the Venice AI Chat Completions API, including support for tools, tool calls, log probabilities, and streaming.

class venice_ai.types.chat.ChatCompletion(**data: Any)[source]

Represents the complete response from a chat completion request.

Parameters:: data (typing.Any)

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class venice_ai.types.chat.ChatCompletionChoice(**data: Any)[source]

Represents a single completion choice generated by the model.

Parameters:: data (typing.Any)

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class venice_ai.types.chat.ChatCompletionChoiceLogprobs(**data: Any)[source]

Aggregates log probability information for all tokens in a completion choice.

Parameters:: data (typing.Any)

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class venice_ai.types.chat.ChatCompletionChunk(**data: Any)[source]

Represents a single chunk in a streaming chat completion response.

Parameters:: data (typing.Any)

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class venice_ai.types.chat.ChatCompletionChunkChoice(**data: Any)[source]

Represents a single choice within a streaming chat completion chunk.

Parameters:: data (typing.Any)

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class venice_ai.types.chat.ChatCompletionChunkChoiceDelta(**data: Any)[source]

Contains the incremental changes for a choice in a streaming chat completion.

Parameters:: data (typing.Any)

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class venice_ai.types.chat.ChatCompletionChunkToolCall(**data: Any)[source]

Represents an incremental tool call within a streaming chat completion chunk. Fields are optional as they arrive incrementally.

Parameters:: data (typing.Any)

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class venice_ai.types.chat.ChatCompletionChunkToolCallFunction(**data: Any)[source]

Represents function call details within a streaming chat completion chunk. Fields are optional as they arrive incrementally.

Parameters:: data (typing.Any)

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class venice_ai.types.chat.ChatCompletionMessage(**data: Any)[source]

Represents a message returned by the model in a chat completion response.

Parameters:: data (typing.Any)

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class venice_ai.types.chat.ChatCompletionTokenLogprob(**data: Any)[source]

Contains comprehensive log probability information for a single token.

Parameters:: data (typing.Any)

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class venice_ai.types.chat.ChatCompletionTopLogprob(**data: Any)[source]

Represents log probability information for alternative tokens at a specific position.

Parameters:: data (typing.Any)

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class venice_ai.types.chat.ChunkModelFactory(**data: Any)[source]

A protocol for classes that can be instantiated from keyword arguments. Used to define the expected interface for stream_cls in chat completions, where the class’s __init__ method should accept **data.

Parameters:: data (typing.Any)

class venice_ai.types.chat.CreateChatCompletionRequest[source]

Defines the complete request structure for creating a chat completion.

This class encapsulates all parameters and options available for chat completion requests, including conversation messages, model selection, generation parameters, tool specifications, and Venice-specific features.

Used as the primary input type for chat completion endpoints, providing comprehensive control over model behavior, output format, tool usage, and specialized features. Supports both streaming and non-streaming completions with extensive customization options.

frequency_penalty: typing.NotRequired[float]: Optional. Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.

logit_bias: typing.NotRequired[typing.Dict[str, int]]: Optional. Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100.

logprobs: typing.NotRequired[bool]: Optional. Whether to return log probabilities of the output tokens, which appear in the logprobs property of the choice object. Defaults to false.

max_completion_tokens: typing.NotRequired[int]: Optional. The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model’s context length.

max_temp: typing.NotRequired[float]

0 <= x <= 2.

Type:: Optional. Maximum temperature value for dynamic temperature scaling. Range

max_tokens: typing.NotRequired[int]: Optional. Deprecated. Please use max_completion_tokens instead. The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model’s context length.

messages: typing.Sequence[venice_ai.types.chat.MessageParam]: A list of messages comprising the conversation so far.

min_p: typing.NotRequired[float]

0 <= x <= 1.

Type:: Optional. Sets a minimum probability threshold for token selection. Tokens with probabilities below this value are filtered out. Range

min_temp: typing.NotRequired[float]

0 <= x <= 2.

Type:: Optional. Minimum temperature value for dynamic temperature scaling. Range

model: str: ID of the model to use. See the model endpoint compatibility table for details on which models support this endpoint.

n: typing.NotRequired[int]: Optional. How many chat completion choices to generate for each input message. Note that you will be charged for the number of generated tokens across all of the choices. Defaults to 1.

parallel_tool_calls: typing.NotRequired[bool]: Optional. Whether to enable parallel function calling during tool use.

presence_penalty: typing.NotRequired[float]: Optional. Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.

repetition_penalty: typing.NotRequired[float]: Optional. Penalty for token repetition.

response_format: typing.NotRequired[venice_ai.types.chat.ResponseFormat]: Optional. An object specifying the format that the model must output. Setting to { "type": "json_object" } enables JSON mode, which guarantees the message the model generates is valid JSON.

seed: typing.NotRequired[int]: Optional. This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

stop: typing.NotRequired[typing.Union[str, typing.List[str]]]: Optional. Up to 4 sequences where the API will stop generating further tokens.

stop_token_ids: typing.NotRequired[typing.List[int]]: Optional. List of token IDs at which to stop generation.

stream: typing.NotRequired[bool]: Optional. If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. Defaults to false.

stream_options: typing.NotRequired[venice_ai.types.chat.StreamOptions]: Optional. Options for streaming response. Only used if stream is true.

temperature: typing.NotRequired[float]: Optional. What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Defaults to 0.7.

tool_choice: typing.NotRequired[typing.Union[typing.Literal['none', 'auto'], venice_ai.types.chat.ToolChoiceObject]]: Optional. Controls which (if any) function is called by the model. none means the model will not call a function and instead generates a message. auto means the model can pick between generating a message or calling a function. Specifying a particular function via {"type": "function", "function": {"name": "my_function"}} forces the model to call that function.

tools: typing.NotRequired[typing.List[venice_ai.types.chat.Tool]]: Optional. A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for.

top_k: typing.NotRequired[int]: Optional. Number of highest probability vocabulary tokens to keep for top-k-filtering.

top_logprobs: typing.NotRequired[int]: Optional. An integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.

top_p: typing.NotRequired[float]: Optional. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Defaults to 1.

user: typing.NotRequired[str]: Optional. A unique identifier representing your end-user, which can help Venice monitor and detect abuse.

venice_parameters: typing.NotRequired[venice_ai.types.chat.VeniceParameters]: Optional. Venice-specific parameters to extend or modify API behavior.

class venice_ai.types.chat.FunctionDefinition[source]: Defines the structure and parameters of a function that can be called by the model.

class venice_ai.types.chat.MessageParam[source]: Defines the structure of a message in a chat conversation for requests.

class venice_ai.types.chat.ResponseFormat[source]

Specifies the desired output format for the model’s response.

This class enables structured output generation by constraining the model to produce responses in specific formats, particularly JSON. Supports both general JSON mode and schema-constrained JSON generation for applications requiring structured data output.

Used in chat completion requests to ensure the model’s response conforms to expected formats, enabling reliable parsing and processing of model output in structured applications.

json_schema: typing.NotRequired[typing.Dict[str, typing.Any]]: Optional. A JSON schema object that the model’s output must adhere to. Only used if type is json_schema.

type: typing.Literal['json_object', 'json_schema']: Must be one of json_object or json_schema. Setting to json_object enables JSON mode, directing the model to generate a valid JSON object. Setting to json_schema also enables JSON mode and additionally requires the model to generate a JSON object that conforms to the provided JSON schema.

class venice_ai.types.chat.StreamOptions[source]

Configures the behavior and features of streaming chat completion responses.

This class provides options for controlling how streaming responses are delivered, including whether to include usage statistics in the final chunk. Used in chat completion requests when streaming is enabled to customize the streaming behavior according to client needs.

Enables fine-grained control over streaming features, allowing clients to optimize for their specific use cases and processing requirements.

include_usage: bool: If set, an additional chunk will be streamed before the data: [DONE] message. This chunk will contain a usage field, providing token usage information for the entire request.

class venice_ai.types.chat.Tool[source]: Represents a tool that the model can invoke during chat completion.

class venice_ai.types.chat.ToolCall(**data: Any)[source]

Represents a complete tool call made by the model during chat completion. (Response DTO)

Parameters:: data (typing.Any)

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class venice_ai.types.chat.ToolCallFunction(**data: Any)[source]

Contains the details of a function call made by the model. (Response DTO)

Parameters:: data (typing.Any)

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class venice_ai.types.chat.ToolChoiceFunction[source]: Specifies a particular function to be called when using structured tool choice.

class venice_ai.types.chat.ToolChoiceObject[source]: Defines the object form of tool choice specification for forcing specific tool usage.

class venice_ai.types.chat.UsageData(**data: Any)[source]

Provides token usage statistics for a chat completion request.

Parameters:: data (typing.Any)

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class venice_ai.types.chat.VeniceParameters[source]

Contains Venice-specific parameters for customizing chat completion behavior.

This class provides access to Venice AI’s unique features and capabilities, including character personas, web search integration, and system prompt customization. These parameters extend the standard chat completion API with Venice-specific functionality.

Used in chat completion requests to leverage Venice AI’s distinctive features, enabling enhanced conversational experiences and specialized capabilities not available in standard chat completion APIs.

character_slug: str: Optional. The slug of a specific character to use for the completion. This will influence the model’s persona, response style, and behavior patterns.

disable_thinking: bool: Optional. On supported reasoning models, will disable thinking and strip the <think></think> blocks from the response.

enable_web_citations: bool: Optional. When web search is enabled, this will request that the LLM cite its sources using a [REF]0[/REF] format.

enable_web_search: typing.Literal['on', 'off', 'auto']: Optional. Controls whether the model can perform web searches to enhance responses. on always enables search, off disables it completely, auto (default) lets the model decide based on context.

include_search_results_in_stream: bool: Optional. Experimental feature. When set to true, the LLM will include search results in the first emitted chunk.

include_venice_system_prompt: bool: Optional. If true (default), the default Venice system prompt will be included. Set to false to exclude it and use only the provided messages.

strip_thinking_response: bool: Optional. Strip <think></think> blocks from the response. Applicable only to reasoning/thinking models.

class venice_ai.types.chat.VeniceParametersResponse(**data: Any)[source]

Venice-specific parameters included in the chat completion response.

Contains information about Venice-specific features that were used or configured for the request, including web search settings, character information, and thinking/reasoning controls.

Parameters:: data (typing.Any)

character_slug: typing.Optional[str]: The character slug used for this request, if any.

disable_thinking: bool: Whether thinking was disabled for this request.

enable_web_citations: bool: Whether web citations were enabled for this request.

enable_web_search: typing.Literal['auto', 'off', 'on']: The web search setting that was used for this request.

include_search_results_in_stream: bool: Whether search results were included in the stream.

include_venice_system_prompt: bool: Whether the Venice system prompt was included.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

strip_thinking_response: bool: Whether thinking responses were stripped from the output.

web_search_citations: typing.List[venice_ai.types.chat.WebSearchCitation]: List of web search citations if web search was performed.

class venice_ai.types.chat.WebSearchCitation(**data: Any)[source]

Represents a web search citation in the Venice parameters response.

Contains information about web sources cited by the model when web search is enabled, including the source URL, title, content snippet, and date.

Parameters:: data (typing.Any)

content: typing.Optional[str]: A snippet of content from the web source.

date: typing.Optional[str]: The date of the web source in ISO format.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

title: str: The title of the web page or source.

url: str: The URL of the web source.

class venice_ai.types.models.Model[source]

Represents a single AI model available through the Venice.ai API.

Contains comprehensive information about an AI model including its identification and specifications. The model_spec field contains all the detailed information about pricing, capabilities, and constraints.

Parameters:

id (str) – Unique identifier for the model.
object (Literal["model"]) – Object type, always "model".
created (int) – Unix timestamp (seconds) of when the model was created.
owned_by (str) – Organization or user that owns the model.
type (ModelType) – Type of the model (e.g., "text", "image").
model_spec (ModelSpec) – Detailed specifications including pricing, capabilities, and constraints.

class venice_ai.types.models.ModelCapabilities[source]

Defines the functional capabilities and limitations of an AI model.

Specifies what features a model supports, including code optimization, quantization method, reasoning, vision, and various other capabilities. Uses camelCase field names to match the API response format.

Parameters:

optimizedForCode (bool) – Indicates if the model is optimized for code generation.
quantization (str) – The quantization method used (e.g., “fp16”, “fp8”).
supportsFunctionCalling (bool) – Indicates if the model supports function calling.
supportsReasoning (bool) – Indicates if the model supports reasoning capabilities.
supportsResponseSchema (bool) – Indicates if the model supports structured response schemas.
supportsVision (bool) – Indicates if the model supports vision/image understanding.
supportsWebSearch (bool) – Indicates if the model supports web search integration.
supportsLogProbs (bool) – Indicates if the model supports log probability output.
streaming (bool) – Legacy field - Indicates if the model supports streaming responses.
async (bool) – Legacy field - Indicates if the model supports asynchronous operations.
max_tokens (int) – Legacy field - Maximum number of tokens the model can process.
supports_functions (bool) – Legacy field - Use supportsFunctionCalling instead.

class venice_ai.types.models.ModelCompatibilityList[source]

Represents a mapping of external model names to Venice.ai model IDs.

Provides compatibility mappings that allow users to reference models using external naming conventions (e.g., OpenAI model names) while automatically resolving to the corresponding Venice.ai model IDs.

Parameters:

object (Literal["list"]) – Object type, always "list".
data (Dict[str, str]) – Mapping of external model names to Venice model IDs (e.g., {"gpt-4o": "llama-3.3-70b"}).
type (Optional[ModelType]) – Optional. The type of models in the mapping, if filtered.

class venice_ai.types.models.ModelConstraints[source]

Defines parameter constraints and valid ranges for a model.

Contains the allowable ranges and default values for various model parameters like temperature and top_p. Used within the Model class to specify parameter limits.

Parameters:

temperature (ModelConstraintsTemperature) – Constraints for the temperature parameter.
top_p (ModelConstraintsTopP) – Constraints for the top_p parameter.

class venice_ai.types.models.ModelConstraintsTemperature[source]

Defines valid range and default value for the temperature parameter.

Specifies the constraints for the temperature parameter that controls randomness in model outputs. Used within ModelConstraints. Note: The API may only return ‘default’ without min/max values.

Parameters:

default (float) – Default temperature value (always present).
min (float) – Minimum allowed temperature value (optional).
max (float) – Maximum allowed temperature value (optional).

class venice_ai.types.models.ModelConstraintsTopP[source]

Defines valid range and default value for the top_p parameter.

Specifies the constraints for the top_p parameter that controls nucleus sampling in model outputs. Used within ModelConstraints. Note: The API may only return ‘default’ without min/max values.

Parameters:

default (float) – Default top_p value (always present).
min (float) – Minimum allowed top_p value (optional).
max (float) – Maximum allowed top_p value (optional).

class venice_ai.types.models.ModelList[source]

Represents a collection of AI models returned by the list models endpoint.

Contains a list of available Model objects along with metadata about the collection. Typically returned when querying for available models through the API.

Parameters:

object (Literal["list"]) – Object type, always "list".
data (List[Model]) – A list of available Model objects.
type (Optional[ModelType]) – Optional. The type of models in the list, if filtered.

class venice_ai.types.models.ModelPricing[source]

Represents pricing information for an AI model.

Defines the cost structure for using a model, including costs per token, image, or time unit depending on the model type. Used within the Model class to provide billing information.

The pricing structure now supports both USD and VCU (Venice Compute Units) for accurate cost tracking and billing.

Parameters:

input (PricingUnit) – Pricing for input operations.
output (PricingUnit) – Pricing for output operations.
input_cost_per_mtok (float) – Legacy: Cost for input per 1000 tokens (USD only).
output_cost_per_mtok (float) – Legacy: Cost for output per 1000 tokens (USD only).
input_cost_per_image (float) – Cost for input per image.
output_cost_per_image (float) – Cost for output per image.
input_cost_per_second (float) – Cost for input per second (e.g., audio).
output_cost_per_second (float) – Cost for output per second (e.g., audio).

class venice_ai.types.models.ModelSpec[source]

Defines the specifications for a model including pricing and capabilities.

Contains detailed information about a model’s pricing structure, capabilities, constraints, and other specifications. This is the main container for model metadata in the API response.

Parameters:

pricing (ModelPricing) – Pricing information for the model with USD and VCU costs.
availableContextTokens (int) – Maximum context window size in tokens.
capabilities (ModelCapabilities) – Model capabilities and feature support.
constraints (ModelConstraints) – Parameter constraints for the model.
name (str) – Human-readable name of the model.
modelSource (str) – URL or reference to the model source.
offline (bool) – Whether the model is currently offline.
traits (List[str]) – List of model traits (e.g., “default”, “fastest”).
beta (bool) – Indicates if this is a beta model (optional).

class venice_ai.types.models.ModelTraitList[source]

Represents a mapping of model traits to their corresponding model IDs.

Provides a way to map semantic model traits (like “default”, “fastest”, “most_accurate”) to specific model IDs. Used for trait-based model selection through the API.

Parameters:

object (Literal["list"]) – Object type, always "list".
data (Dict[str, str]) – Mapping of trait names to model IDs (e.g., {"default": "llama-3.3-70b"}).
type (Optional[ModelType]) – Optional. The type of models in the mapping, if filtered.

venice_ai.types.models.ModelType

Type alias for valid model types in the Venice.ai API.

Defines the available categories of AI models that can be filtered when listing models, traits, or compatibility mappings. Each type represents a different class of AI functionality:

"embedding": Models that generate vector embeddings from text
"image": Models for image generation and manipulation
"text": Models for text generation and chat completions
"tts": Text-to-speech models for audio generation
"upscale": Models for image upscaling and enhancement

alias of Literal[‘embedding’, ‘image’, ‘text’, ‘tts’, ‘upscale’]

class venice_ai.types.models.PricingDetail[source]

Represents pricing details for input and output.

Parameters:

input (PricingUnit) – Pricing for input (per 1000 tokens for text models).
output (PricingUnit) – Pricing for output (per 1000 tokens for text models).

class venice_ai.types.models.PricingUnit[source]

Represents a pricing unit with both USD and VCU values.

Parameters:

usd (float) – Cost in US dollars.
vcu (float) – Cost in Venice Compute Units.

Type definitions for Venice AI image-related API endpoints.

class venice_ai.types.image.GenerateImageRequest[source]

Represents parameters for an image generation request to the /image/generate endpoint.

This model defines the structure for requesting image generation using Venice AI’s native image generation API. It provides comprehensive control over generation parameters including model selection, prompts, dimensions, and various quality and style settings.

Parameters:

model (str) – ID of the model to use for image generation (e.g., "venice-sd35").
prompt (str) – Text prompt describing the image to generate.
cfg_scale (float) – Optional. Classifier Free Guidance scale (1.0-30.0). Higher values adhere more strictly to the prompt.
embed_exif_metadata (bool) – Optional. Whether to embed generation metadata in EXIF data.
format (Literal["jpeg", "png", "webp"]) – Optional. Output image format.
height (int) – Optional. Height of the generated image in pixels.
hide_watermark (bool) – Optional. Whether to hide the Venice AI watermark from the generated image.
lora_strength (int) – Optional. Strength of LoRA model adaptation (0-100).
negative_prompt (str) – Optional. Text describing what to avoid in the generated image.
return_binary (bool) – Optional. If True, return raw image bytes instead of JSON response with base64 data.
safe_mode (bool) – Optional. Whether to enable content filtering for safer outputs.
seed (int) – Optional. Random seed for reproducible image generation results.
steps (int) – Optional. Number of diffusion steps. Higher values generally improve quality but increase generation time.
style_preset (str) – Optional. Style preset ID to apply to the generated image.
width (int) – Optional. Width of the generated image in pixels.

class venice_ai.types.image.ImageDataItem(**data: Any)[source]

Represents an individual image data item within a SimpleImageResponse.

This model defines the structure for a single generated image in OpenAI-compatible responses, providing either base64-encoded image data or a URL reference to the generated image depending on the requested response format.

Contains either base64 encoded image data or a URL to the image, but not both.

Parameters:

b64_json (Optional[str]) – Base64-encoded image data as a JSON string (when response_format is "b64_json").
url (Optional[str]) – URL pointing to the generated image (when response_format is "url").
data (typing.Any)

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class venice_ai.types.image.ImageResponse(**data: Any)[source]

Represents the response structure from the /image/generate endpoint.

This model defines the complete response format for Venice AI’s native image generation API, containing the generated images as base64-encoded data along with metadata including timing information and the original request parameters.

Parameters:

id (str) – Unique identifier for the image generation request, used for tracking and reference.
images (List[str]) – List of base64-encoded image data strings representing the generated images.
request (Optional[Dict[str, Any]]) – Optional. Echo of the original request parameters that were used for generation.
timing (TimingInfo) – Detailed timing information and performance metrics for the request.
created (Optional[str]) – Optional. ISO 8601 timestamp indicating when the image generation was completed.
data (typing.Any)

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class venice_ai.types.image.ImageStyleEnum(*values)[source]

Represents common or example styles for image generation.

This enum provides a static, curated list of frequently used image styles. For a comprehensive and dynamically updated list of all available styles, it is recommended to use the venice_ai.resources.image.Image.get_available_styles() method (or its asynchronous counterpart venice_ai.resources.image.AsyncImage.get_available_styles()) to fetch the current styles directly from the API.

Example static values:

ANALOG_FILM = 'Analog Film': Vintage analog film photography style

ANIME = 'Anime': Japanese anime/manga artistic style

CINEMATIC = 'Cinematic': Movie-like cinematic style with dramatic lighting

COMIC_BOOK = 'Comic Book': Comic book illustration style

THREE_D_MODEL = '3D Model': 3D rendered model style

class venice_ai.types.image.ImageStyleList[source]

Represents the response structure from the /image/styles endpoint.

This model defines the format for retrieving available image style presets from the Venice AI API. These styles can be used with the style_preset parameter in image generation requests to influence the artistic direction and visual characteristics of generated images.

Parameters:

data (List[str]) – List of available image style preset names that can be used in generation requests.
object (Literal["list"]) – Type of the response object, always "list" to indicate this is a list response.

class venice_ai.types.image.SimpleGenerateImageRequest[source]

Represents parameters for an OpenAI-compatible image generation request to the /images/generations endpoint.

This model provides a simplified interface for image generation that maintains compatibility with OpenAI’s image generation API. It offers streamlined parameters for common image generation tasks while supporting Venice AI’s enhanced features like custom quality settings and output formats.

Parameters:

prompt (str) – Text prompt describing the image to generate.
background (Optional[Literal["transparent", "opaque", "auto"]]) – Optional. Background style for the generated image.
model (str) – ID of the model to use for image generation.
moderation (Optional[Literal["low", "auto"]]) – Optional. Content moderation level to apply during generation.
n (Optional[int]) – Optional. Number of images to generate (typically 1-10).
output_compression (Optional[int]) – Optional. Output image compression level (0-100, where 100 is highest quality).
output_format (Literal["jpeg", "png", "webp"]) – Optional. Output image format.
quality (Optional[Literal["auto", "high", "medium", "low", "hd", "standard"]]) – Optional. Image quality setting that affects generation parameters.
response_format (Optional[Literal["b64_json", "url"]]) – Optional. Format of the response data (base64 JSON or URL).
size (Optional[Literal["auto", "256x256", "512x512", "1024x1024", "1536x1024", "1024x1536", "1792x1024", "1024x1792"]]) – Optional. Dimensions of the generated image in pixels.
style (Optional[Literal["vivid", "natural"]]) – Optional. Artistic style of the generated image.
user (str) – Optional. User identifier for tracking and analytics purposes.

class venice_ai.types.image.SimpleImageResponse(**data: Any)[source]

Represents the response structure from the /images/generations (OpenAI-compatible) endpoint.

This model provides an OpenAI-compatible response format for image generation requests, containing a list of generated images and creation timestamp. It maintains compatibility with existing OpenAI client libraries and workflows.

Parameters:

created (int) – Unix timestamp (seconds since epoch) indicating when the image generation was initiated.
images (List[ImageDataItem]) – List of image data items. The API provides this under the ‘data’ key.
data (typing.Any)

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class venice_ai.types.image.TimingInfo(**data: Any)[source]

Represents timing metrics for image generation and processing operations.

This model provides detailed performance information about various stages of image generation requests, enabling monitoring and optimization of processing times across different components of the Venice AI pipeline. All timing values are measured in seconds.

Parameters:

inferenceDuration (float) – Duration of the actual inference/generation process in seconds.
inferencePreprocessingTime (float) – Time spent on preprocessing operations before inference begins, in seconds.
inferenceQueueTime (float) – Time spent waiting in the inference queue before processing starts, in seconds.
total (float) – Total time taken for the entire request from start to completion, in seconds.
data (typing.Any)

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class venice_ai.types.image.UpscaleImageRequest[source]

Represents parameters for an image upscaling request to the /image/upscale endpoint.

This model defines the structure for requesting image upscaling and enhancement operations. It allows for scaling existing images to higher resolutions while optionally applying AI-powered enhancements to improve quality and detail.

Note: The ‘image’ data is sent base64-encoded within the JSON payload.

Parameters:

enhance (Literal["true", "false"]) – Optional. Whether to enhance image quality during upscaling ("true" or "false").
enhance_creativity (Optional[float]) – Optional. Creativity level for enhancement (0.0-1.0, where 1.0 is most creative).
enhance_prompt (str) – Optional. Text prompt to guide the enhancement process.
replication (Optional[float]) – Optional. Replication factor for matching the original image (0.0-1.0, where 1.0 matches exactly).
scale (float) – Optional. Scaling factor for upscaling (e.g., 2.0 for 2x upscaling).

Type definitions for Venice AI API Keys functionality.

class venice_ai.types.api_keys.ApiKey[source]

Represents a complete API key object in the Venice AI system.

This type defines the structure of an API key as returned by the Venice AI API key management endpoints. Contains all metadata, configuration, and usage information associated with an API key, including its type, limits, creation details, and current usage statistics.

Retrieved from /api_keys endpoint.

apiKeyType: typing.Literal['INFERENCE', 'ADMIN']: Type of the API key, determining its access permissions and capabilities.

consumptionLimits: venice_ai.types.api_keys.ConsumptionLimit: Consumption limits and spending constraints associated with this API key.

createdAt: typing.Optional[str]: ISO 8601 timestamp indicating when the API key was created.

description: str: Human-readable description or name assigned to the API key.

expiresAt: typing.Optional[str]: ISO 8601 timestamp when the API key expires, or None if it never expires.

id: str: Unique identifier for the API key used in management operations.

last6Chars: str: Last 6 characters of the actual API key value for identification purposes.

lastUsedAt: typing.Optional[str]: ISO 8601 timestamp of the most recent API request made with this key.

usage: venice_ai.types.api_keys.ApiKeyUsage: Current usage statistics and consumption metrics for this API key.

class venice_ai.types.api_keys.ApiKeyCreateRequest[source]

Request payload for creating a new API key.

This type defines the structure of the request body used when creating a new API key through the Venice AI API. Includes all configurable parameters such as key type, consumption limits, expiration settings, and optional Web3 integration parameters.

Used with POST /api_keys endpoint.

apiKeyType: typing.Literal['INFERENCE', 'ADMIN']: Type of API key to create, determining access permissions and capabilities.

consumptionLimit: venice_ai.types.api_keys.ConsumptionLimit: Spending and usage limits to apply to the new API key.

description: str: Human-readable description or name for the new API key.

expiresAt: typing.Optional[str]: Optional expiration date in ISO 8601 format, or empty string for no expiration.

web3_address: typing.Optional[str]: Optional Web3 wallet address for blockchain-authenticated API keys.

web3_network_id: typing.Optional[str]: Optional Web3 network identifier for blockchain-authenticated API keys.

class venice_ai.types.api_keys.ApiKeyCreateResponse[source]

Response payload returned after successful API key creation.

This type represents the response structure returned by the Venice AI API when a new API key is successfully created. Contains the complete details of the newly created key, including the secret key value which is only shown once during creation for security purposes.

Contains the newly created API key details.

data: typing.Dict[str, typing.Union[str, venice_ai.types.api_keys.ConsumptionLimit, None]]: Dictionary containing the created API key details, including the secret key value (shown only once).

success: bool: Boolean flag indicating whether the API key creation operation was successful.

class venice_ai.types.api_keys.ApiKeyGenerateWeb3KeyCreateRequest[source]

Request payload for creating a new Web3-authenticated API key.

This type defines the structure of the request body used when creating a Web3 API key through wallet signature verification. Includes all standard API key parameters plus Web3-specific fields for address verification, signature proof, and the authentication token.

Used with POST /api_keys/generate_web3_key endpoint.

address: str: Web3 wallet address used for blockchain-based authentication.

apiKeyType: typing.Literal['INFERENCE', 'ADMIN']: Type of API key to create, determining access permissions and capabilities.

consumptionLimit: venice_ai.types.api_keys.ConsumptionLimit: Spending and usage limits to apply to the new Web3 API key.

description: str: Human-readable description or name for the new Web3 API key.

expiresAt: typing.Optional[str]: Optional expiration date in ISO 8601 format, or None for no expiration.

signature: str: Cryptographic signature proving ownership of the specified wallet address.

token: str: Authentication token obtained from the preliminary GET request.

class venice_ai.types.api_keys.ApiKeyGenerateWeb3KeyCreateResponse[source]

Response payload returned after successful Web3 API key creation.

This type represents the response structure returned by the Venice AI API when a new Web3 API key is successfully created through wallet signature verification. Contains the complete details of the newly created key, including the secret key value which is only shown once during creation.

Response format for Web3 API key creation.

Contains the newly created API key details.

data: typing.Dict[str, typing.Union[str, venice_ai.types.api_keys.ConsumptionLimit, None]]: Dictionary containing the created Web3 API key details, including the secret key value (shown only once).

success: bool: Boolean flag indicating whether the Web3 API key creation operation was successful.

class venice_ai.types.api_keys.ApiKeyGenerateWeb3KeyGetResponse[source]

Response payload for Web3 key generation token retrieval.

This type represents the response structure returned when requesting a token for Web3 API key generation. The token is required as part of the Web3 key creation process to ensure secure authentication through wallet signature verification.

Response from the GET /api_keys/generate_web3_key endpoint.

Contains token needed for Web3 key generation.

data: typing.Dict[str, str]: Dictionary containing the authentication token required for subsequent Web3 key creation.

success: bool: Boolean flag indicating whether the token retrieval operation was successful.

class venice_ai.types.api_keys.ApiKeyList[source]

Response payload containing a collection of API key objects.

This type represents the response structure returned by the Venice AI API when retrieving a list of API keys. Provides a standardized container for multiple API key objects with metadata indicating the response type.

Retrieved from GET /api_keys endpoint.

data: typing.List[venice_ai.types.api_keys.ApiKey]: Array of API key objects containing metadata and configuration details.

object: typing.Literal['list']: Response type identifier, always “list” for collection responses.

class venice_ai.types.api_keys.ApiKeyRateLimitItem[source]

Represents rate limit configuration for a specific API model.

This type defines the rate limiting rules applied to a particular model when accessed through an API key. Contains the model identifier and associated rate limit policies that govern request frequency and volume.

apiModelId: str: Unique identifier of the API model to which these rate limits apply.

rateLimits: typing.List[typing.Dict[str, typing.Union[float, str]]]: Array of rate limiting rules and policies governing usage of this model.

class venice_ai.types.api_keys.ApiKeyUsage[source]

Represents usage statistics and metrics for an API key.

This type encapsulates usage information for an API key, providing insights into consumption patterns over specific time periods. Used to track and monitor API key activity for billing and rate limiting purposes.

trailingSevenDays: typing.Dict[str, str]: Usage statistics for the trailing 7-day period, containing ‘usd’ and ‘vcu’ consumption values.

class venice_ai.types.api_keys.ApiTier[source]

Represents API tier information and billing configuration.

This type defines the characteristics of an API tier, including its identifier and billing status. API tiers determine access levels, rate limits, and whether usage is subject to charges.

id: str: Unique identifier of the API tier level.

isCharged: bool: Boolean flag indicating whether usage under this tier incurs billing charges.

class venice_ai.types.api_keys.Balances[source]

Represents current account balances in supported currencies.

This type contains the available balances for an account across different currency types supported by the Venice AI platform, including traditional USD and Venice Compute Units (VCU).

USD: float: Current account balance in US Dollars.

VCU: float: Current account balance in Venice Compute Units.

class venice_ai.types.api_keys.ConsumptionLimit[source]

Defines consumption limits for API keys within each billing epoch.

This type represents the spending and usage constraints that can be applied to API keys to control resource consumption. Limits can be specified in both USD currency and Venice Compute Units (VCU).

usd: typing.Optional[float]: Optional spending limit in US Dollars for the billing period.

vcu: typing.Optional[float]: Optional usage limit in Venice Compute Units for the billing period.

class venice_ai.types.api_keys.RateLimitInfo[source]

Comprehensive rate limit and access information for an API key.

This type represents the complete rate limiting context for an API key, including current access permissions, tier information, account balances, key expiration details, and specific rate limit configurations. Used to determine whether requests can be processed and what limits apply.

Retrieved from /api_keys/rate_limits endpoint.

accessPermitted: bool: Boolean flag indicating whether API access is currently permitted based on rate limits and account status.

apiTier: venice_ai.types.api_keys.ApiTier: API tier configuration and billing information associated with this key.

balances: venice_ai.types.api_keys.Balances: Current account balances across supported currency types.

keyExpiration: typing.Optional[str]: ISO 8601 timestamp when the API key expires, or None if it never expires.

nextEpochBegins: str: ISO 8601 timestamp indicating when the next rate limiting epoch period begins.

rateLimits: typing.List[venice_ai.types.api_keys.ApiKeyRateLimitItem]: Array of model-specific rate limiting rules and configurations applied to this key.

class venice_ai.types.api_keys.RateLimitLog[source]

Represents a single rate limit event log entry.

This type defines the structure of individual rate limit log entries that track rate limiting events for API keys. Contains details about the key, model, tier, event type, and timing information for auditing and monitoring rate limit enforcement.

Retrieved from /api_keys/rate_limits/log endpoint.

apiKeyId: str: Unique identifier of the API key that triggered this rate limit event.

modelId: str: Identifier of the API model involved in the rate limiting event.

rateLimitTier: str: Rate limit tier that was active when this event occurred.

rateLimitType: str: Type of rate limit event (e.g., “exceeded”, “reset”, “warning”).

timestamp: str: ISO 8601 timestamp when this rate limit event was recorded.

class venice_ai.types.api_keys.RateLimitLogList[source]

Response payload containing a collection of rate limit log entries.

This type represents the response structure returned by the Venice AI API when retrieving rate limit logs. Provides a standardized container for multiple rate limit log entries with metadata indicating the response type.

Retrieved from GET /api_keys/rate_limits/log endpoint.

data: typing.List[venice_ai.types.api_keys.RateLimitLog]: Array of rate limit log entries containing event details and timestamps.

object: typing.Literal['list']: Response type identifier, always “list” for collection responses.

Type definitions for Venice AI Audio API.

This module contains TypedDict definitions and Enums for request objects in the Venice AI Audio API, covering the speech creation endpoint.

class venice_ai.types.audio.CreateSpeechRequest[source]

Request parameters for creating speech audio from text input.

This TypedDict defines the structure for requests to the POST /audio/speech endpoint, which converts text into spoken audio using specified voice characteristics and output format. The request allows customization of voice selection, audio format, playback speed, and user identification for tracking purposes.

model: ID of the model to use for speech generation (e.g., “tts-kokoro”).

input: The text to convert to speech. Maximum length varies by model.

voice: The voice to use for the generated audio. See Voice for available options.

response_format: Optional. The format to return the audio in. Defaults to “mp3”. See ResponseFormat for available formats.

speed: Optional. The speed of the generated audio. Select a value from 0.25 to 4.0. Defaults to 1.0.

user: Optional. A unique identifier representing the end-user, which can help Venice AI to monitor and detect abuse.

class venice_ai.types.audio.ResponseFormat(*values)[source]

Available audio response formats for speech generation output.

This enumeration defines the supported audio file formats that can be requested when generating speech from text. The format determines the encoding, compression, and quality characteristics of the returned audio data from the text-to-speech endpoint. Different formats offer trade-offs between file size, quality, and compatibility.

class venice_ai.types.audio.Voice(*values)[source]

Available voices for speech generation in the Venice AI Audio API.

This enumeration defines the complete set of voice options that can be used when generating speech from text via the text-to-speech endpoint. Each voice represents different speaker characteristics including gender, accent, and vocal qualities. Voice names follow a pattern indicating language/region and gender (e.g., af for American Female, am for American Male).

class venice_ai.types.audio.VoiceDetail[source]

Detailed information about a single text-to-speech voice.

This TypedDict represents the structure of voice information returned by the get_voices() method. It contains metadata about a voice including its unique identifier, associated model, gender characteristics, and regional/language information derived from the voice ID.

id: The unique identifier for the voice as provided by the API (e.g., “af_alloy”, “zm_yunjian”).

model_id: The ID of the TTS model this voice is associated with (e.g., “tts-kokoro”).

gender: The perceived gender of the voice, parsed from the voice ID prefix. “unknown” if the prefix is not recognized or ambiguous.

region_code: The raw two-letter prefix from the voice ID that typically indicates region/language and gender (e.g., “af”, “zm”).

language: A descriptive name of the primary language associated with the voice, derived from the region_code (e.g., “American English”, “Mandarin Chinese”).

accent: A descriptive name of the accent or locale associated with the voice, derived from the region_code (e.g., “US”, “Standard Chinese”).

class venice_ai.types.audio.VoiceList[source]

A list of voice details with optional filtering metadata.

This TypedDict represents the structure returned by the get_voices() method, containing a list of VoiceDetail objects along with metadata about any filters that were applied to generate the list. This follows the standard API pattern for list responses.

object: A string indicating the type of API object, always “list” for lists.

data: A list containing VoiceDetail objects.

model_id_filter: The model_id that was used to filter the voices, if any. None if no model ID filter was applied.

gender_filter: The gender that was used to filter the voices, if any. None if no gender filter was applied.

region_code_filter: The region_code (e.g., “af”, “zm”) that was used to filter the voices, if any. None if no region code filter was applied.

Type definitions for Venice AI Embeddings API.

This module contains TypedDict definitions for request and response objects in the Venice AI Embeddings API, covering the embeddings creation endpoint.

class venice_ai.types.embeddings.CreateEmbeddingRequest[source]

Request parameters for creating embeddings from text or token inputs.

This TypedDict defines the structure for requests to the POST /embeddings endpoint, which generates vector embeddings from input text or tokens using specified models. The embeddings can be used for semantic search, clustering, and similarity tasks.

model: ID of the embedding model to use.

input: Text or tokens to embed. Can be a string, list of strings, list of tokens, or list of token lists.

dimensions: Optional. Number of dimensions for the output embeddings.

encoding_format: Optional. Format for returned embeddings ("float" or "base64"). Defaults to "float".

user: Optional. Unique identifier for the end-user.

dimensions: typing.Optional[int]: The number of dimensions the resulting output embedding should have. Only supported in text-embedding-3 and later models. Optional parameter.

encoding_format: typing.Literal['float', 'base64']: The format to return the embeddings in. Can be "float" (default) or "base64". Optional parameter.

input: typing.Union[str, typing.List[str], typing.List[int], typing.List[typing.List[int]]]: Input text or tokens to embed.

model: str: ID of the model to use.

user: typing.Optional[str]: A unique identifier representing your end-user, which can help Venice AI monitor and detect abuse. Optional parameter.

class venice_ai.types.embeddings.Embedding[source]

Represents a single embedding vector with its metadata.

This TypedDict defines an individual embedding result containing the vector representation of input text along with its position index and object type. Each embedding is part of the response from the /embeddings endpoint and contains the actual numerical vector that can be used for similarity calculations.

embedding: Embedding vector as a list of floats or base64-encoded string.

index: Index of this embedding in the list.

object: Type of the object, always “embedding”.

embedding: typing.Union[typing.List[float], str]: The embedding vector, which is a list of floats or a base64-encoded string (when encoding_format=”base64”). The length of vector depends on the model used.

index: int: The index of the embedding in the list of embeddings.

object: typing.Literal['embedding']: The type of the object, which is always "embedding".

class venice_ai.types.embeddings.EmbeddingList[source]

Complete response from the embeddings creation endpoint.

This TypedDict represents the full response structure returned by the POST /embeddings endpoint, containing a list of embedding vectors, model information, and usage statistics. This is the primary response format for all embedding generation requests.

data: List of embedding objects.

model: Model used to generate the embeddings.

object: Type of the object, always “list”.

usage: Token usage statistics for the request.

data: typing.List[venice_ai.types.embeddings.Embedding]: The list of embeddings generated by the model.

model: str: The model ID used to generate the embeddings.

object: typing.Literal['list']: The object type, which is always "list".

usage: venice_ai.types.embeddings.EmbeddingUsage: The usage statistics for the request.

class venice_ai.types.embeddings.EmbeddingUsage[source]

Token usage statistics for an embedding request.

This TypedDict provides information about token consumption during the embedding generation process, including both prompt tokens and total tokens used. This information is useful for tracking API usage and costs.

prompt_tokens: Number of tokens in the input.

total_tokens: Total number of tokens used in the request.

prompt_tokens: int: The number of tokens used by the input prompt.

total_tokens: int: The total number of tokens used by the request.

Type definitions for Venice AI Billing API.

This module contains TypedDict definitions for request parameters and response objects in the Venice AI Billing API, covering the billing usage endpoint.

class venice_ai.types.billing.BillingFormatEnum(*values)[source]

Defines available output formats for billing usage data responses.

This enumeration specifies the supported data formats that can be requested when retrieving billing usage information from the Venice AI API. Different formats may be suitable for different use cases, such as programmatic processing or data export.

Used to specify the desired response format in billing usage API requests.

CSV = 'csv': CSV format - returns raw CSV data as bytes for export purposes.

JSON = 'json': JSON format - returns structured data as BillingUsageResponse.

class venice_ai.types.billing.BillingUsageEntry[source]

Represents a single billing usage record from the Venice AI API.

This model defines the structure of individual usage entries returned by the billing usage endpoint. Each entry represents a billable event with associated costs, units consumed, and metadata about the API usage.

Used as the primary data structure for tracking and reporting API usage costs across different services and time periods.

amount: float: Total amount charged for this usage entry.

currency: typing.Literal['USD', 'VCU']: Currency denomination for the charge (either USD or Venice Compute Units).

inferenceDetails: typing.Optional[venice_ai.types.billing.InferenceDetails]: Detailed inference metadata, present only for LLM-related usage entries.

notes: str: Additional notes or description associated with this billing entry.

pricePerUnitUsd: float: Price per unit in USD for this specific usage type.

sku: str: Stock Keeping Unit (SKU) identifier for the product or service used.

timestamp: str: ISO 8601 formatted timestamp indicating when this usage occurred.

units: float: Quantity of units consumed for this billing entry.

class venice_ai.types.billing.BillingUsagePagination[source]

Represents pagination metadata for billing usage API responses.

This model contains information about the pagination state of billing usage queries, including current page position, total available records, and pagination limits. Used in conjunction with billing usage responses to enable efficient navigation through large datasets.

Essential for handling paginated billing data retrieval from the Venice AI API.

limit: float: Maximum number of items returned per page in the current request.

page: float: Current page number in the paginated result set (1-based).

total: float: Total number of billing usage entries available across all pages.

totalPages: float: Total number of pages available for the current query parameters.

class venice_ai.types.billing.BillingUsageRequestParams[source]

Represents query parameters for filtering and paginating billing usage data.

This model defines the optional parameters that can be used to customize billing usage queries to the Venice AI API. Supports filtering by date range, currency type, and pagination controls to retrieve specific subsets of billing data.

All parameters are optional and will use API defaults when not specified. Used to construct targeted billing usage requests based on specific criteria.

currency: typing.Optional[typing.Literal['USD', 'VCU']]: Filter results by currency type (USD for US Dollars, VCU for Venice Compute Units).

endDate: typing.Optional[str]

00:00Z”).

Type:: End date for the billing period filter in ISO 8601 format (e.g., “2025-05-01T00

limit: typing.Optional[int]

1-500, default: 200).

Type:: Maximum number of items to return per page (valid range

page: typing.Optional[int]

1).

Type:: Page number for pagination, starting from 1 (default

sortOrder: typing.Optional[typing.Literal['asc', 'desc']]

‘desc’).

Type:: Sort order for results by timestamp (ascending or descending, default

startDate: typing.Optional[str]

00:00Z”).

Type:: Start date for the billing period filter in ISO 8601 format (e.g., “2025-01-01T00

class venice_ai.types.billing.BillingUsageResponse[source]

Represents the complete response structure from the billing usage endpoint.

This model serves as the top-level container for billing usage data returned by the Venice AI API. It combines the actual usage records with pagination metadata, providing a comprehensive view of billing information for a given query.

Used as the primary response type for all billing usage API calls.

data: typing.List[venice_ai.types.billing.BillingUsageEntry]: Array of billing usage records for the requested time period and filters.

pagination: venice_ai.types.billing.BillingUsagePagination: Pagination metadata including current page, total items, and page limits.

class venice_ai.types.billing.InferenceDetails[source]

Represents detailed information about an inference request for billing purposes.

This model contains metadata about LLM inference requests, including token counts and execution metrics. Used within billing usage entries to provide granular details about API usage costs and performance.

Note

These details are only present for LLM usage entries and may be absent for other types of API usage.

completionTokens: typing.Optional[float]: Number of tokens generated in the completion response (present only for LLM inference requests).

inferenceExecutionTime: typing.Optional[float]: Total execution time for the inference request in milliseconds.

promptTokens: typing.Optional[float]: Number of tokens in the input prompt (present only for LLM inference requests).

requestId: typing.Optional[str]: Unique identifier for the specific inference request.

class venice_ai.types.characters.Character(**data: Any)[source]

Represents an AI character definition in the Venice AI system.

This model defines a complete AI character with all its attributes including metadata, configuration settings, and behavioral parameters. Characters are used in chat completions and other AI interactions to provide specific personalities, knowledge bases, and response styles.

Parameters:: data (typing.Any)

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class venice_ai.types.characters.CharacterList(**data: Any)[source]

Represents a paginated collection of AI characters.

This model serves as a container for multiple character objects, typically returned by character listing and search API endpoints. It follows the standard API response format with a data array containing character objects and an object type identifier for response validation and parsing.

Parameters:: data (typing.Any)

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Exceptions¶

exception venice_ai.exceptions.APIConnectionError(message: str = 'Connection error', *, original_error: Exception | None = None, request: Any | None = None, response: httpx.Response | None = None)[source]

Raised when there’s an issue connecting to the Venice AI API.

This exception is raised when network-level connectivity issues prevent the client from establishing a connection to the API server. This could be due to:

Network connectivity problems
DNS resolution failures
Connection timeouts during establishment
SSL/TLS handshake failures
Proxy configuration issues

Parameters:

message (str) – Human-readable description of the error. Defaults to “Connection error”.
original_error (Optional[Exception]) – Optional. The original exception that caused this error.
request (Optional[Any]) – Optional. The httpx.Request object associated with the error.
response (Optional[httpx.Response]) – Optional. The httpx.Response object if available.

Variables:

original_error (Optional[Exception]) – Original exception that caused this error.
request (Optional[Any]) – httpx.Request object associated with the error, if available.

exception venice_ai.exceptions.APIError(message: str, *, request: httpx.Request | None = None, response: httpx.Response, body: Any | None = None)[source]

Raised when the API returns a non-2xx status code.

This is a general exception for API-related errors, including server errors (5xx status codes) and unhandled client errors (4xx status codes). More specific exception subclasses are available for common HTTP status codes.

Parameters:

message (str) – Human-readable description of the error.
request (Optional[httpx.Request]) – Optional. The httpx.Request object that led to the error.
response (httpx.Response) – The httpx.Response object from the API call.
body (Optional[Any]) – Optional. The parsed response body, if available.

Variables:

status_code (int) – HTTP status code of the response.
body (Optional[Any]) – Parsed response body, if available.

exception venice_ai.exceptions.APIResponseProcessingError(message: str, *, original_error: Exception | None = None, response: httpx.Response | None = None)[source]

Raised when there’s an error processing the API response.

This exception is raised when the client successfully receives a response from the Venice AI API but encounters an error while processing the response data. This could be due to:

Unexpected response format or structure
JSON parsing errors
Missing expected fields in the response
Data type conversion failures
Response validation errors

Parameters:

message (str) – Human-readable description of the error.
original_error (Optional[Exception]) – Optional. The original exception that caused this error.
response (Optional[httpx.Response]) – Optional. The httpx.Response object if available.

Variables:

original_error (Optional[Exception]) – Original exception that caused this error.

exception venice_ai.exceptions.APITimeoutError(message: str = 'Request timed out', *, original_error: Exception | None = None, request: Any | None = None, response: httpx.Response | None = None)[source]

Raised when an API request times out.

This exception is raised when a request to the Venice AI API takes too long to complete and exceeds the configured timeout limit. This can occur during:

Long-running operations (e.g., image generation, large file processing)
Network latency issues
Server processing delays
Read timeout while waiting for response data

Parameters:

message (str) – Human-readable description of the error. Defaults to “Request timed out”.
original_error (Optional[Exception]) – Optional. The original exception that caused this error.
request (Optional[Any]) – Optional. The httpx.Request object associated with the error.
response (Optional[httpx.Response]) – Optional. The httpx.Response object if available.

Variables:

original_error (Optional[Exception]) – Original exception that caused this error.
request (Optional[Any]) – httpx.Request object associated with the error, if available.

exception venice_ai.exceptions.AuthenticationError(message: str, *, request: httpx.Request | None = None, response: httpx.Response, body: Any | None = None)[source]

Raised for 401 Unauthorized errors, typically due to an invalid API key.

This exception is raised when the API returns a 401 status code, indicating that the request lacks valid authentication credentials. This commonly occurs when:

The API key is missing or invalid
The API key has been revoked or expired
The API key lacks the necessary permissions for the requested operation

Parameters:

message (str) – Human-readable description of the error.
request (Optional[httpx.Request]) – Optional. The httpx.Request object that led to the error.
response (httpx.Response) – The httpx.Response object from the API call.
body (Optional[Any]) – Optional. The parsed response body, if available.

exception venice_ai.exceptions.ConflictError(message: str, *, request: httpx.Request | None = None, response: httpx.Response, body: Any | None = None)[source]

Raised for 409 Conflict errors when a resource conflict occurs.

This exception is raised when the API returns a 409 status code, indicating that the request could not be completed due to a conflict with the current state of the resource. This may occur when:

Attempting to create a resource that already exists
Concurrent modifications to the same resource
Business logic constraints prevent the operation

Parameters:

message (str) – Human-readable description of the error.
request (Optional[httpx.Request]) – Optional. The httpx.Request object that led to the error.
response (httpx.Response) – The httpx.Response object from the API call.
body (Optional[Any]) – Optional. The parsed response body, if available.

exception venice_ai.exceptions.InternalServerError(message: str, *, request: httpx.Request | None = None, response: httpx.Response, body: Any | None = None)[source]

Raised for 500 Internal Server Error and other 5xx server-side errors.

This exception is raised when the API returns a 5xx status code, indicating that an error occurred on the API server’s end. This includes various server-side failures such as:

Internal server errors (500)
Service unavailable (503)
Gateway timeout (504)
Inference failures
Upscale failures
Unknown server errors

Parameters:

message (str) – Human-readable description of the error.
request (Optional[httpx.Request]) – Optional. The httpx.Request object that led to the error.
response (httpx.Response) – The httpx.Response object from the API call.
body (Optional[Any]) – Optional. The parsed response body, if available.

exception venice_ai.exceptions.InvalidRequestError(message: str, *, request: httpx.Request | None = None, response: httpx.Response, body: Any | None = None)[source]

Raised for 400 Bad Request errors due to invalid request parameters.

This exception is raised when the API returns a 400 status code, indicating that the server cannot process the request due to client error. This typically occurs when:

Required fields are missing from the request
Parameter values are invalid or malformed
The request payload format is incorrect
File size exceeds limits (413 status code also maps to this exception)
Unsupported content type (415 status code also maps to this exception)

Parameters:

message (str) – Human-readable description of the error.
request (Optional[httpx.Request]) – Optional. The httpx.Request object that led to the error.
response (httpx.Response) – The httpx.Response object from the API call.
body (Optional[Any]) – Optional. The parsed response body, if available.

exception venice_ai.exceptions.MissingStreamClassError(message: str, *, request: httpx.Request | None = None, response: httpx.Response | None = None)[source]

Raised when stream=True but no stream_cls is provided.

This exception is raised when attempting to use streaming functionality but the required stream class parameter is not provided. This typically occurs in chat completions or other streaming operations where the client needs to know how to handle the streamed response data.

Parameters:

message (str) – Human-readable description of the error.
request (typing.Optional[httpx.Request])
response (typing.Optional[httpx.Response])

exception venice_ai.exceptions.NotFoundError(message: str, *, request: httpx.Request | None = None, response: httpx.Response, body: Any | None = None)[source]

Raised for 404 Not Found errors when a requested resource is not found.

This exception is raised when the API returns a 404 status code, indicating that the requested resource could not be found. This commonly occurs when:

An incorrect model name is specified
A character slug does not exist
An API endpoint path is invalid
A resource identifier (ID) is not found

Parameters:

message (str) – Human-readable description of the error.
request (Optional[httpx.Request]) – Optional. The httpx.Request object that led to the error.
response (httpx.Response) – The httpx.Response object from the API call.
body (Optional[Any]) – Optional. The parsed response body, if available.

exception venice_ai.exceptions.PaymentRequiredError(message: str, *, request: httpx.Request | None = None, response: httpx.Response, body: Any | None = None)[source]

Raised for 402 Payment Required errors when there is insufficient balance.

This exception is raised when the API returns a 402 status code, indicating that the request cannot be processed due to insufficient USD or VCU (Venice Compute Units) balance in the account. The client should add funds or credits before retrying.

Parameters:

message (str) – Human-readable description of the error.
request (Optional[httpx.Request]) – Optional. The httpx.Request object that led to the error.
response (httpx.Response) – The httpx.Response object from the API call.
body (Optional[Any]) – Optional. The parsed response body, if available.

exception venice_ai.exceptions.PermissionDeniedError(message: str, *, request: httpx.Request | None = None, response: httpx.Response, body: Any | None = None)[source]

Raised for 403 Forbidden errors when access is denied.

This exception is raised when the API returns a 403 status code, indicating that the client does not have permission to perform the requested action. The request was valid and authenticated, but the server is refusing to authorize it.

Parameters:

message (str) – Human-readable description of the error.
request (Optional[httpx.Request]) – Optional. The httpx.Request object that led to the error.
response (httpx.Response) – The httpx.Response object from the API call.
body (Optional[Any]) – Optional. The parsed response body, if available.

exception venice_ai.exceptions.RateLimitError(message: str, *, request: httpx.Request | None = None, response: httpx.Response, body: Any | None = None, retry_after_seconds: int | None = None)[source]

Raised for 429 Too Many Requests errors when rate limits are exceeded.

This exception is raised when the API returns a 429 status code, indicating that the client has sent too many requests in a given time frame and has exceeded the rate limit. The client should wait before making additional requests.

Parameters:

message (str) – Human-readable description of the error.
request (Optional[httpx.Request]) – Optional. The httpx.Request object that led to the error.
response (httpx.Response) – The httpx.Response object from the API call.
body (Optional[Any]) – Optional. The parsed response body, if available.
retry_after_seconds (Optional[int]) – Optional. The number of seconds to wait before retrying, parsed from the Retry-After header.

Variables:

retry_after_seconds (Optional[int]) – Number of seconds to wait before retrying, if available.

exception venice_ai.exceptions.ServiceUnavailableError(message: str, *, request: httpx.Request | None = None, response: httpx.Response, body: Any | None = None)[source]

Raised for 503 Service Unavailable errors when the service is temporarily unavailable.

This exception is raised when the API returns a 503 status code, indicating that the service is temporarily unavailable. This commonly occurs when:

The requested model is at capacity
The service is undergoing maintenance
Temporary server overload

Clients should implement retry logic with exponential backoff when encountering this error.

Parameters:

message (str) – Human-readable description of the error.
request (Optional[httpx.Request]) – Optional. The httpx.Request object that led to the error.
response (httpx.Response) – The httpx.Response object from the API call.
body (Optional[Any]) – Optional. The parsed response body, if available.

exception venice_ai.exceptions.StreamClosedError(message: str = 'Connection error', *, original_error: Exception | None = None, request: Any | None = None, response: httpx.Response | None = None)[source]

Raised when an attempt is made to operate on a stream whose underlying connection has been closed.

This exception is raised when trying to iterate over a stream whose underlying httpx.Response object has been closed. Once a stream’s underlying connection is closed, it cannot be iterated.

Parameters:

message (str) – Human-readable description of the error.
request (Optional[httpx.Request]) – Optional. The httpx.Request object associated with the error.
response (Optional[httpx.Response]) – Optional. The httpx.Response object if available.
original_error (typing.Optional[Exception])

exception venice_ai.exceptions.StreamConsumedError(message: str = 'Connection error', *, original_error: Exception | None = None, request: Any | None = None, response: httpx.Response | None = None)[source]

Raised when an attempt is made to operate on a stream that has already been consumed.

This exception is raised when trying to iterate over a stream that has already been fully consumed or exhausted. Once a stream has been consumed, it cannot be re-iterated.

Parameters:

message (str) – Human-readable description of the error.
request (Optional[httpx.Request]) – Optional. The httpx.Request object associated with the error.
response (Optional[httpx.Response]) – Optional. The httpx.Response object if available.
original_error (typing.Optional[Exception])

exception venice_ai.exceptions.UnprocessableEntityError(message: str, *, request: httpx.Request | None = None, response: httpx.Response, body: Any | None = None)[source]

Raised for 422 Unprocessable Entity errors due to validation failures.

This exception is raised when the API returns a 422 status code, indicating that the request was well-formed but contained semantic errors that prevented it from being processed. This typically occurs when:

Request data fails server-side validation rules
Business logic constraints are violated
Data format is correct but values are semantically invalid

Parameters:

message (str) – Human-readable description of the error.
request (Optional[httpx.Request]) – Optional. The httpx.Request object that led to the error.
response (httpx.Response) – The httpx.Response object from the API call.
body (Optional[Any]) – Optional. The parsed response body, if available.

exception venice_ai.exceptions.VeniceError(message: str, *, request: httpx.Request | None = None, response: httpx.Response | None = None)[source]

Base exception for all errors raised by the Venice AI client.

This is the parent class for all custom exceptions in the Venice AI library. All other exception classes inherit from this base exception.

Parameters:

message (str) – Human-readable description of the error.
request (Optional[httpx.Request]) – Optional. The httpx.Request object associated with the error.
response (Optional[httpx.Response]) – Optional. The httpx.Response object if the error originated from an API call.

Variables:

message (str) – Human-readable description of the error.
request_obj (Optional[httpx.Request]) – The httpx.Request object associated with the error, if available.
response_obj (Optional[httpx.Response]) – The httpx.Response object if the error originated from an API call.

Utilities¶

For utility functions provided by the library, please see the Client Utilities page.

API Reference¶

Client Classes¶

Advanced HTTP Client Configuration¶

Passing a Pre-configured httpx.Client / httpx.AsyncClient¶

Passing httpx Settings Directly¶

Chat Resources¶

Models Resources¶

Image Resources¶

API Keys Resources¶

Audio Resources¶

Embeddings Resources¶

Billing Resources¶

Characters Resources¶

Type Definitions¶

Exceptions¶

Utilities¶

Passing a Pre-configured `httpx.Client` / `httpx.AsyncClient`¶

Passing `httpx` Settings Directly¶