konfig/carbon-php-sdk
最新稳定版本:v0.2.53
Composer 安装命令:
composer require konfig/carbon-php-sdk
包简介
Connect external data to LLMs, no matter the source.
README 文档
README
Carbon
Connect external data to LLMs, no matter the source.
Table of Contents
- Installation & Usage
- Getting Started
- Reference
carbon.auth.getAccessTokencarbon.auth.getWhiteLabelingcarbon.cRM.getAccountcarbon.cRM.getAccountscarbon.cRM.getContactcarbon.cRM.getContactscarbon.cRM.getLeadcarbon.cRM.getLeadscarbon.cRM.getOpportunitiescarbon.cRM.getOpportunitycarbon.dataSources.addTagscarbon.dataSources.querycarbon.dataSources.queryUserDataSourcescarbon.dataSources.removeTagscarbon.dataSources.revokeAccessTokencarbon.embeddings.allcarbon.embeddings.getDocumentscarbon.embeddings.getEmbeddingsAndChunkscarbon.embeddings.uploadChunksAndEmbeddingscarbon.files.createUserFileTagscarbon.files.deleteFileTagscarbon.files.deleteManycarbon.files.deleteV2carbon.files.getParsedFilecarbon.files.getRawFilecarbon.files.modifyColdStorageParameterscarbon.files.moveToHotStoragecarbon.files.queryUserFilescarbon.files.queryUserFilesDeprecatedcarbon.files.resynccarbon.files.uploadcarbon.files.uploadFromUrlcarbon.files.uploadTextcarbon.github.getIssuecarbon.github.getIssuescarbon.github.getPrcarbon.github.getPrCommentscarbon.github.getPrCommitscarbon.github.getPrFilescarbon.github.getPullRequestscarbon.integrations.cancelcarbon.integrations.connectDataSourcecarbon.integrations.connectDocument360carbon.integrations.connectFreshdeskcarbon.integrations.connectGitbookcarbon.integrations.connectGurucarbon.integrations.createAwsIamUsercarbon.integrations.getOauthUrlcarbon.integrations.listConfluencePagescarbon.integrations.listConversationscarbon.integrations.listDataSourceItemscarbon.integrations.listFolderscarbon.integrations.listGitbookSpacescarbon.integrations.listLabelscarbon.integrations.listOutlookCategoriescarbon.integrations.listReposcarbon.integrations.listSharepointSitescarbon.integrations.syncAzureBlobFilescarbon.integrations.syncAzureBlobStoragecarbon.integrations.syncConfluencecarbon.integrations.syncDataSourceItemscarbon.integrations.syncFilescarbon.integrations.syncGitHubcarbon.integrations.syncGitbookcarbon.integrations.syncGmailcarbon.integrations.syncOutlookcarbon.integrations.syncReposcarbon.integrations.syncRssFeedcarbon.integrations.syncS3Filescarbon.integrations.syncSlackcarbon.organizations.getcarbon.organizations.updatecarbon.organizations.updateStatscarbon.users.allcarbon.users.deletecarbon.users.getcarbon.users.toggleUserFeaturescarbon.users.updateUserscarbon.users.whoAmIcarbon.utilities.fetchUrlscarbon.utilities.fetchWebpagecarbon.utilities.fetchYoutubeTranscriptscarbon.utilities.processSitemapcarbon.utilities.scrapeSitemapcarbon.utilities.scrapeWebcarbon.utilities.searchUrlscarbon.utilities.userWebpagescarbon.webhooks.addUrlcarbon.webhooks.deleteUrlcarbon.webhooks.urlscarbon.whiteLabel.allcarbon.whiteLabel.createcarbon.whiteLabel.deletecarbon.whiteLabel.update
Installation & Usage
Requirements
This library requires PHP ^8.0
Composer
To install the bindings via Composer, add the following to composer.json:
{
"repositories": [
{
"type": "vcs",
"url": "https://github.com/Carbon-for-Developers/carbon-php-sdk.git"
}
],
"require": {
"konfig/carbon-php-sdk": "0.2.53"
}
}
Then run composer install
Manual Installation
Download the files and include autoload.php:
<?php require_once('/path/to/carbon-php-sdk/vendor/autoload.php');
Getting Started
Please follow the installation procedure and then run the following:
<?php require_once(__DIR__ . '/vendor/autoload.php'); // 1) Get an access token for a customer $carbon = new \Carbon\Client( apiKey: "API_KEY", customerId: "CUSTOMER_ID", ); $result = $carbon->auth->getAccessToken(); # 2) Use the access token to authenticate moving forward $carbon = new \Carbon\Client(accessToken: $token->getAccessToken()); # use SDK as usual $whiteLabeling = $carbon->auth->getWhiteLabeling(); # etc.
Reference
carbon.auth.getAccessToken
Get Access Token
🛠️ Usage
$result = $carbon->auth->getAccessToken();
🔄 Return
🌐 Endpoint
/auth/v1/access_token GET
carbon.auth.getWhiteLabeling
Returns whether or not the organization is white labeled and which integrations are white labeled
:param current_user: the current user :param db: the database session :return: a WhiteLabelingResponse
🛠️ Usage
$result = $carbon->auth->getWhiteLabeling();
🔄 Return
🌐 Endpoint
/auth/v1/white_labeling GET
carbon.cRM.getAccount
Get Account
🛠️ Usage
$result = $carbon->cRM->getAccount( id: "id_example", data_source_id: 1, include_remote_data: False, includes: [ "string_example" ] );
⚙️ Parameters
id: string
data_source_id: int
include_remote_data: bool
includes: []
🔄 Return
🌐 Endpoint
/integrations/data/crm/accounts/{id} GET
carbon.cRM.getAccounts
Get Accounts
🛠️ Usage
$result = $carbon->cRM->getAccounts( data_source_id: 1, include_remote_data: False, next_cursor: "string_example", page_size: 1, order_dir: "asc", includes: [], filters: [ ], order_by: "created_at" );
⚙️ Parameters
data_source_id: int
include_remote_data: bool
next_cursor: string
page_size: int
order_dir:
includes: []
filters: AccountFilters
order_by:
🔄 Return
🌐 Endpoint
/integrations/data/crm/accounts POST
carbon.cRM.getContact
Get Contact
🛠️ Usage
$result = $carbon->cRM->getContact( id: "id_example", data_source_id: 1, include_remote_data: False, includes: [ "string_example" ] );
⚙️ Parameters
id: string
data_source_id: int
include_remote_data: bool
includes: []
🔄 Return
🌐 Endpoint
/integrations/data/crm/contacts/{id} GET
carbon.cRM.getContacts
Get Contacts
🛠️ Usage
$result = $carbon->cRM->getContacts( data_source_id: 1, include_remote_data: False, next_cursor: "string_example", page_size: 1, order_dir: "asc", includes: [], filters: [ ], order_by: "created_at" );
⚙️ Parameters
data_source_id: int
include_remote_data: bool
next_cursor: string
page_size: int
order_dir:
includes: []
filters: ContactFilters
order_by:
🔄 Return
🌐 Endpoint
/integrations/data/crm/contacts POST
carbon.cRM.getLead
Get Lead
🛠️ Usage
$result = $carbon->cRM->getLead( id: "id_example", data_source_id: 1, include_remote_data: False, includes: [ "string_example" ] );
⚙️ Parameters
id: string
data_source_id: int
include_remote_data: bool
includes: []
🔄 Return
🌐 Endpoint
/integrations/data/crm/leads/{id} GET
carbon.cRM.getLeads
Get Leads
🛠️ Usage
$result = $carbon->cRM->getLeads( data_source_id: 1, include_remote_data: False, next_cursor: "string_example", page_size: 1, order_dir: "asc", includes: [], filters: [ ], order_by: "created_at" );
⚙️ Parameters
data_source_id: int
include_remote_data: bool
next_cursor: string
page_size: int
order_dir:
includes: []
filters: LeadFilters
order_by:
🔄 Return
🌐 Endpoint
/integrations/data/crm/leads POST
carbon.cRM.getOpportunities
Get Opportunities
🛠️ Usage
$result = $carbon->cRM->getOpportunities( data_source_id: 1, include_remote_data: False, next_cursor: "string_example", page_size: 1, order_dir: "asc", includes: [], filters: [ "status" => "WON", ], order_by: "created_at" );
⚙️ Parameters
data_source_id: int
include_remote_data: bool
next_cursor: string
page_size: int
order_dir:
includes: []
filters: OpportunityFilters
order_by:
🔄 Return
🌐 Endpoint
/integrations/data/crm/opportunities POST
carbon.cRM.getOpportunity
Get Opportunity
🛠️ Usage
$result = $carbon->cRM->getOpportunity( id: "id_example", data_source_id: 1, include_remote_data: False, includes: [ "string_example" ] );
⚙️ Parameters
id: string
data_source_id: int
include_remote_data: bool
includes: []
🔄 Return
🌐 Endpoint
/integrations/data/crm/opportunities/{id} GET
carbon.dataSources.addTags
Add Data Source Tags
🛠️ Usage
$result = $carbon->dataSources->addTags( tags: [], data_source_id: 1 );
⚙️ Parameters
tags: object
data_source_id: int
🔄 Return
🌐 Endpoint
/data_sources/tags/add POST
carbon.dataSources.query
Data Sources
🛠️ Usage
$result = $carbon->dataSources->query( pagination: [ "limit" => 10, "offset" => 0, "starting_id" => 0, ], order_by: "created_at", order_dir: "desc", filters: [ "source" => "GOOGLE_CLOUD_STORAGE", ] );
⚙️ Parameters
pagination: Pagination
order_by:
order_dir:
filters: OrganizationUserDataSourceFilters
🔄 Return
OrganizationUserDataSourceResponse
🌐 Endpoint
/data_sources POST
carbon.dataSources.queryUserDataSources
User Data Sources
🛠️ Usage
$result = $carbon->dataSources->queryUserDataSources( pagination: [ "limit" => 10, "offset" => 0, "starting_id" => 0, ], order_by: "created_at", order_dir: "desc", filters: [ "source" => "GOOGLE_CLOUD_STORAGE", ] );
⚙️ Parameters
pagination: Pagination
order_by:
order_dir:
filters: OrganizationUserDataSourceFilters
🔄 Return
OrganizationUserDataSourceResponse
🌐 Endpoint
/user_data_sources POST
carbon.dataSources.removeTags
Remove Data Source Tags
🛠️ Usage
$result = $carbon->dataSources->removeTags( data_source_id: 1, tags_to_remove: [], remove_all_tags: False );
⚙️ Parameters
data_source_id: int
tags_to_remove: string[]
remove_all_tags: bool
🔄 Return
🌐 Endpoint
/data_sources/tags/remove POST
carbon.dataSources.revokeAccessToken
Revoke Access Token
🛠️ Usage
$result = $carbon->dataSources->revokeAccessToken( data_source_id: 1 );
⚙️ Parameters
data_source_id: int
🔄 Return
🌐 Endpoint
/revoke_access_token POST
carbon.embeddings.all
Retrieve Embeddings And Content V2
🛠️ Usage
$result = $carbon->embeddings->all( filters: [ "include_all_children" => False, "non_synced_only" => False, ], pagination: [ "limit" => 10, "offset" => 0, "starting_id" => 0, ], order_by: "created_at", order_dir: "desc", include_vectors: False );
⚙️ Parameters
filters: OrganizationUserFilesToSyncFilters
pagination: Pagination
order_by:
order_dir:
include_vectors: bool
🔄 Return
🌐 Endpoint
/list_chunks_and_embeddings POST
carbon.embeddings.getDocuments
For pre-filtering documents, using tags_v2 is preferred to using tags (which is now deprecated). If both tags_v2
and tags are specified, tags is ignored. tags_v2 enables
building complex filters through the use of "AND", "OR", and negation logic. Take the below input as an example:
{
"OR": [
{
"key": "subject",
"value": "holy-bible",
"negate": false
},
{
"key": "person-of-interest",
"value": "jesus christ",
"negate": false
},
{
"key": "genre",
"value": "religion",
"negate": true
}
{
"AND": [
{
"key": "subject",
"value": "tao-te-ching",
"negate": false
},
{
"key": "author",
"value": "lao-tzu",
"negate": false
}
]
}
]
}
In this case, files will be filtered such that:
- "subject" = "holy-bible" OR
- "person-of-interest" = "jesus christ" OR
- "genre" != "religion" OR
- "subject" = "tao-te-ching" AND "author" = "lao-tzu"
Note that the top level of the query must be either an "OR" or "AND" array. Currently, nesting is limited to 3. For tag blocks (those with "key", "value", and "negate" keys), the following typing rules apply:
- "key" isn't optional and must be a
string - "value" isn't optional and can be
anyor list[any] - "negate" is optional and must be
trueorfalse. If present andtrue, then the filter block is negated in the resulting query. It isfalseby default.
When querying embeddings, you can optionally specify the media_type parameter in your request. By default (if
not set), it is equal to "TEXT". This means that the query will be performed over files that have
been parsed as text (for now, this covers all files except image files). If it is equal to "IMAGE",
the query will be performed over image files (for now, .jpg and .png files). You can think of this
field as an additional filter on top of any filters set in file_ids and
When hybrid_search is set to true, a combination of keyword search and semantic search are used to rank
and select candidate embeddings during information retrieval. By default, these search methods are weighted
equally during the ranking process. To adjust the weight (or "importance") of each search method, you can use
the hybrid_search_tuning_parameters property. The description for the different tuning parameters are:
weight_a: weight to assign to semantic searchweight_b: weight to assign to keyword search
You must ensure that sum(weight_a, weight_b,..., weight_n) for all n weights is equal to 1. The equality
has an error tolerance of 0.001 to account for possible floating point issues.
In order to use hybrid search for a customer across a set of documents, two flags need to be enabled:
- Use the
/modify_user_configurationendpoint to to enablesparse_vectorsfor the customer. The payload body for this request is below:
{
"configuration_key_name": "sparse_vectors",
"value": {
"enabled": true
}
}
- Make sure hybrid search is enabled for the documents across which you want to perform the search. For the
/uploadfileendpoint, this can be done by setting the following query parameter:generate_sparse_vectors=true
Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI's
multimodal model; for text, we support OpenAI's text-embedding-ada-002 and Cohere's embed-multilingual-v3.0.
The model can be specified via the embedding_model parameter (in the POST body for /embeddings, and a query
parameter in /uploadfile). If no model is supplied, the text-embedding-ada-002 is used by default. When performing
embedding queries, embeddings from files that used the specified model will be considered in the query.
For example, if files A and B have embeddings generated with OPENAI, and files C and D have embeddings generated with
COHERE_MULTILINGUAL_V3, then by default, queries will only consider files A and B. If COHERE_MULTILINGUAL_V3 is
specified as the embedding_model in /embeddings, then only files C and D will be considered. Make sure that
the set of all files you want considered for a query have embeddings generated via the same model. For now, do not
set VERTEX_MULTIMODAL as an embedding_model. This model is used automatically by Carbon when it detects an image file.
🛠️ Usage
$result = $carbon->embeddings->getDocuments( query: "a", k: 1, tags: [ "key": "string_example", ], query_vector: [ 3.14 ], file_ids: [ 1 ], parent_file_ids: [ 1 ], include_all_children: False, tags_v2: [ ], include_tags: True, include_vectors: True, include_raw_file: True, hybrid_search: True, hybrid_search_tuning_parameters: [ "weight_a" => 0.5, "weight_b" => 0.5, ], media_type: "TEXT", embedding_model: "OPENAI", include_file_level_metadata: False, high_accuracy: False, rerank: [ "model" => "model_example", ], file_types_at_source: [ "string_example" ], exclude_cold_storage_files: False );
⚙️ Parameters
query: string
Query for which to get related chunks and embeddings.
k: int
Number of related chunks to return.
tags: array<string, Tags1>
A set of tags to limit the search to. Deprecated and may be removed in the future.
query_vector: float[]
Optional query vector for which to get related chunks and embeddings. It must have been generated by the same model used to generate the embeddings across which the search is being conducted. Cannot provide both query and query_vector.
file_ids: int[]
Optional list of file IDs to limit the search to
parent_file_ids: int[]
Optional list of parent file IDs to limit the search to. A parent file describes a file to which another file belongs (e.g. a folder)
include_all_children: bool
Flag to control whether or not to include all children of filtered files in the embedding search.
tags_v2: object
A set of tags to limit the search to. Use this instead of tags, which is deprecated.
include_tags: bool
Flag to control whether or not to include tags for each chunk in the response.
include_vectors: bool
Flag to control whether or not to include embedding vectors in the response.
include_raw_file: bool
Flag to control whether or not to include a signed URL to the raw file containing each chunk in the response.
hybrid_search: bool
Flag to control whether or not to perform hybrid search.
hybrid_search_tuning_parameters: HybridSearchTuningParamsNullable
media_type:
embedding_model:
include_file_level_metadata: bool
Flag to control whether or not to include file-level metadata in the response. This metadata will be included in the content_metadata field of each document along with chunk/embedding level metadata.
high_accuracy: bool
Flag to control whether or not to perform a high accuracy embedding search. By default, this is set to false. If true, the search may return more accurate results, but may take longer to complete.
rerank: RerankParamsNullable
file_types_at_source: AutoSyncedSourceTypesPropertyInner[]
Filter files based on their type at the source (for example help center tickets and articles)
exclude_cold_storage_files: bool
Flag to control whether or not to exclude files that are not in hot storage. If set to False, then an error will be returned if any filtered files are in cold storage.
🔄 Return
🌐 Endpoint
/embeddings POST
carbon.embeddings.getEmbeddingsAndChunks
Retrieve Embeddings And Content
🛠️ Usage
$result = $carbon->embeddings->getEmbeddingsAndChunks( filters: [ "user_file_id" => 1, "embedding_model" => "OPENAI", ], pagination: [ "limit" => 10, "offset" => 0, "starting_id" => 0, ], order_by: "created_at", order_dir: "desc", include_vectors: False );
⚙️ Parameters
filters: EmbeddingsAndChunksFilters
pagination: Pagination
order_by:
order_dir:
include_vectors: bool
🔄 Return
🌐 Endpoint
/text_chunks POST
carbon.embeddings.uploadChunksAndEmbeddings
Upload Chunks And Embeddings
🛠️ Usage
$result = $carbon->embeddings->uploadChunksAndEmbeddings( embedding_model: "OPENAI", chunks_and_embeddings: [ [ "file_id" => 1, "chunks_and_embeddings" => [ [ "chunk_number" => 1, "chunk" => "chunk_example", ] ], ] ], overwrite_existing: False, chunks_only: False, custom_credentials: [ "key": [], ] );
⚙️ Parameters
embedding_model:
chunks_and_embeddings: SingleChunksAndEmbeddingsUploadInput[]
overwrite_existing: bool
chunks_only: bool
custom_credentials: array<string,object>
🔄 Return
🌐 Endpoint
/upload_chunks_and_embeddings POST
carbon.files.createUserFileTags
A tag is a key-value pair that can be added to a file. This pair can then be used for searches (e.g. embedding searches) in order to narrow down the scope of the search. A file can have any number of tags. The following are reserved keys that cannot be used:
- db_embedding_id
- organization_id
- user_id
- organization_user_file_id
Carbon currently supports two data types for tag values - string and list<string>.
Keys can only be string. If values other than string and list<string> are used,
they're automatically converted to strings (e.g. 4 will become "4").
🛠️ Usage
$result = $carbon->files->createUserFileTags( tags: [ "key": "string_example", ], organization_user_file_id: 1 );
⚙️ Parameters
tags: array<string, Tags1>
organization_user_file_id: int
🔄 Return
🌐 Endpoint
/create_user_file_tags POST
carbon.files.deleteFileTags
Delete File Tags
🛠️ Usage
$result = $carbon->files->deleteFileTags( tags: [ "string_example" ], organization_user_file_id: 1 );
⚙️ Parameters
tags: string[]
organization_user_file_id: int
🔄 Return
🌐 Endpoint
/delete_user_file_tags POST
carbon.files.deleteMany
Delete Files Endpoint
🛠️ Usage
$result = $carbon->files->deleteMany( file_ids: [ 1 ], sync_statuses: [ "string_example" ], delete_non_synced_only: False, send_webhook: False, delete_child_files: False );
⚙️ Parameters
file_ids: int[]
sync_statuses: []
delete_non_synced_only: bool
send_webhook: bool
delete_child_files: bool
🔄 Return
🌐 Endpoint
/delete_files POST
carbon.files.deleteV2
Delete Files V2 Endpoint
🛠️ Usage
$result = $carbon->files->deleteV2( filters: [ "include_all_children" => False, "non_synced_only" => False, ], send_webhook: False, preserve_file_record: False );
⚙️ Parameters
filters: OrganizationUserFilesToSyncFilters
send_webhook: bool
preserve_file_record: bool
Whether or not to delete all data related to the file from the database, BUT to preserve the file metadata, allowing for resyncs. By default preserve_file_record is false, which means that all data related to the file as well as its metadata will be deleted. Note that even if preserve_file_record is true, raw files uploaded via the uploadfile endpoint still cannot be resynced.
🔄 Return
🌐 Endpoint
/delete_files_v2 POST
carbon.files.getParsedFile
This route is deprecated. Use /user_files_v2 instead.
🛠️ Usage
$result = $carbon->files->getParsedFile( file_id: 1 );
⚙️ Parameters
file_id: int
🔄 Return
🌐 Endpoint
/parsed_file/{file_id} GET
carbon.files.getRawFile
This route is deprecated. Use /user_files_v2 instead.
🛠️ Usage
$result = $carbon->files->getRawFile( file_id: 1 );
⚙️ Parameters
file_id: int
🔄 Return
🌐 Endpoint
/raw_file/{file_id} GET
carbon.files.modifyColdStorageParameters
Modify Cold Storage Parameters
🛠️ Usage
$result = $carbon->files->modifyColdStorageParameters( filters: [ "include_all_children" => False, "non_synced_only" => False, ], enable_cold_storage: True, hot_storage_time_to_live: 1 );
⚙️ Parameters
filters: OrganizationUserFilesToSyncFilters
enable_cold_storage: bool
hot_storage_time_to_live: int
🔄 Return
bool
🌐 Endpoint
/modify_cold_storage_parameters POST
carbon.files.moveToHotStorage
Move To Hot Storage
🛠️ Usage
$result = $carbon->files->moveToHotStorage( filters: [ "include_all_children" => False, "non_synced_only" => False, ] );
⚙️ Parameters
filters: OrganizationUserFilesToSyncFilters
🔄 Return
bool
🌐 Endpoint
/move_to_hot_storage POST
carbon.files.queryUserFiles
For pre-filtering documents, using tags_v2 is preferred to using tags (which is now deprecated). If both tags_v2
and tags are specified, tags is ignored. tags_v2 enables
building complex filters through the use of "AND", "OR", and negation logic. Take the below input as an example:
{
"OR": [
{
"key": "subject",
"value": "holy-bible",
"negate": false
},
{
"key": "person-of-interest",
"value": "jesus christ",
"negate": false
},
{
"key": "genre",
"value": "religion",
"negate": true
}
{
"AND": [
{
"key": "subject",
"value": "tao-te-ching",
"negate": false
},
{
"key": "author",
"value": "lao-tzu",
"negate": false
}
]
}
]
}
In this case, files will be filtered such that:
- "subject" = "holy-bible" OR
- "person-of-interest" = "jesus christ" OR
- "genre" != "religion" OR
- "subject" = "tao-te-ching" AND "author" = "lao-tzu"
Note that the top level of the query must be either an "OR" or "AND" array. Currently, nesting is limited to 3. For tag blocks (those with "key", "value", and "negate" keys), the following typing rules apply:
- "key" isn't optional and must be a
string - "value" isn't optional and can be
anyor list[any] - "negate" is optional and must be
trueorfalse. If present andtrue, then the filter block is negated in the resulting query. It isfalseby default.
🛠️ Usage
$result = $carbon->files->queryUserFiles( pagination: [ "limit" => 10, "offset" => 0, "starting_id" => 0, ], order_by: "created_at", order_dir: "desc", filters: [ "include_all_children" => False, "non_synced_only" => False, ], include_raw_file: True, include_parsed_text_file: True, include_additional_files: True, presigned_url_expiry_time_seconds: 3600 );
⚙️ Parameters
pagination: Pagination
order_by:
order_dir:
filters: OrganizationUserFilesToSyncFilters
include_raw_file: bool
If true, the query will return presigned URLs for the raw file. Only relevant for the /user_files_v2 endpoint.
include_parsed_text_file: bool
If true, the query will return presigned URLs for the parsed text file. Only relevant for the /user_files_v2 endpoint.
include_additional_files: bool
If true, the query will return presigned URLs for additional files. Only relevant for the /user_files_v2 endpoint.
presigned_url_expiry_time_seconds: int
The expiry time for the presigned URLs. Only relevant for the /user_files_v2 endpoint.
🔄 Return
🌐 Endpoint
/user_files_v2 POST
carbon.files.queryUserFilesDeprecated
This route is deprecated. Use /user_files_v2 instead.
🛠️ Usage
$result = $carbon->files->queryUserFilesDeprecated( pagination: [ "limit" => 10, "offset" => 0, "starting_id" => 0, ], order_by: "created_at", order_dir: "desc", filters: [ "include_all_children" => False, "non_synced_only" => False, ], include_raw_file: True, include_parsed_text_file: True, include_additional_files: True, presigned_url_expiry_time_seconds: 3600 );
⚙️ Parameters
pagination: Pagination
order_by:
order_dir:
filters: OrganizationUserFilesToSyncFilters
include_raw_file: bool
If true, the query will return presigned URLs for the raw file. Only relevant for the /user_files_v2 endpoint.
include_parsed_text_file: bool
If true, the query will return presigned URLs for the parsed text file. Only relevant for the /user_files_v2 endpoint.
include_additional_files: bool
If true, the query will return presigned URLs for additional files. Only relevant for the /user_files_v2 endpoint.
presigned_url_expiry_time_seconds: int
The expiry time for the presigned URLs. Only relevant for the /user_files_v2 endpoint.
🔄 Return
🌐 Endpoint
/user_files POST
carbon.files.resync
Resync File
🛠️ Usage
$result = $carbon->files->resync( file_id: 1, chunk_size: 1, chunk_overlap: 1, force_embedding_generation: False, skip_file_processing: False );
⚙️ Parameters
file_id: int
chunk_size: int
chunk_overlap: int
force_embedding_generation: bool
skip_file_processing: bool
🔄 Return
🌐 Endpoint
/resync_file POST
carbon.files.upload
This endpoint is used to directly upload local files to Carbon. The POST request should be a multipart form request.
Note that the set_page_as_boundary query parameter is applicable only to PDFs for now. When this value is set,
PDF chunks are at most one page long. Additional information can be retrieved for each chunk, however, namely the coordinates
of the bounding box around the chunk (this can be used for things like text highlighting). Following is a description
of all possible query parameters:
chunk_size: the chunk size (in tokens) applied when splitting the documentchunk_overlap: the chunk overlap (in tokens) applied when splitting the documentskip_embedding_generation: whether or not to skip the generation of chunks and embeddingsset_page_as_boundary: described aboveembedding_model: the model used to generate embeddings for the document chunksuse_ocr: whether or not to use OCR as a preprocessing step prior to generating chunks. Valid for PDFs, JPEGs, and PNGsgenerate_sparse_vectors: whether or not to generate sparse vectors for the file. Required for hybrid search.prepend_filename_to_chunks: whether or not to prepend the filename to the chunk text
Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI's
multimodal model; for text, we support OpenAI's text-embedding-ada-002 and Cohere's embed-multilingual-v3.0.
The model can be specified via the embedding_model parameter (in the POST body for /embeddings, and a query
parameter in /uploadfile). If no model is supplied, the text-embedding-ada-002 is used by default. When performing
embedding queries, embeddings from files that used the specified model will be considered in the query.
For example, if files A and B have embeddings generated with OPENAI, and files C and D have embeddings generated with
COHERE_MULTILINGUAL_V3, then by default, queries will only consider files A and B. If COHERE_MULTILINGUAL_V3 is
specified as the embedding_model in /embeddings, then only files C and D will be considered. Make sure that
the set of all files you want considered for a query have embeddings generated via the same model. For now, do not
set VERTEX_MULTIMODAL as an embedding_model. This model is used automatically by Carbon when it detects an image file.
🛠️ Usage
$result = $carbon->files->upload( file: open('/path/to/file', 'rb'), chunk_size: 1, chunk_overlap: 1, skip_embedding_generation: False, set_page_as_boundary: False, embedding_model: "string_example", use_ocr: False, generate_sparse_vectors: False, prepend_filename_to_chunks: False, max_items_per_chunk: 1, parse_pdf_tables_with_ocr: False, detect_audio_language: False, transcription_service: "assemblyai", include_speaker_labels: False, media_type: "TEXT", split_rows: False, enable_cold_storage: False, hot_storage_time_to_live: 1, generate_chunks_only: False, store_file_only: False );
⚙️ Parameters
file: \SplFileObject
chunk_size: int
Chunk size in tiktoken tokens to be used when processing file.
chunk_overlap: int
Chunk overlap in tiktoken tokens to be used when processing file.
skip_embedding_generation: bool
Flag to control whether or not embeddings should be generated and stored when processing file.
set_page_as_boundary: bool
Flag to control whether or not to set the a page's worth of content as the maximum amount of content that can appear in a chunk. Only valid for PDFs. See description route description for more information.
embedding_model: ``
Embedding model that will be used to embed file chunks.
use_ocr: bool
Whether or not to use OCR when processing files. Valid for PDFs, JPEGs, and PNGs. Useful for documents with tables, images, and/or scanned text.
generate_sparse_vectors: bool
Whether or not to generate sparse vectors for the file. This is required for the file to be a candidate for hybrid search.
prepend_filename_to_chunks: bool
Whether or not to prepend the file's name to chunks.
max_items_per_chunk: int
Number of objects per chunk. For csv, tsv, xlsx, and json files only.
parse_pdf_tables_with_ocr: bool
Whether to use rich table parsing when use_ocr is enabled.
detect_audio_language: bool
Whether to automatically detect the language of the uploaded audio file.
transcription_service:
The transcription service to use for audio files. If no service is specified, 'deepgram' will be used.
include_speaker_labels: bool
Detect multiple speakers and label segments of speech by speaker for audio files.
media_type:
The media type of the file. If not provided, it will be inferred from the file extension.
split_rows: bool
Whether to split tabular rows into chunks. Currently only valid for CSV, TSV, and XLSX files.
enable_cold_storage: bool
Enable cold storage for the file. If set to true, the file will be moved to cold storage after a certain period of inactivity. Default is false.
hot_storage_time_to_live: int
Time in days after which the file will be moved to cold storage. Must be one of [1, 3, 7, 14, 30].
generate_chunks_only: bool
If this flag is enabled, the file will be chunked and stored with Carbon, but no embeddings will be generated. This overrides the skip_embedding_generation flag.
store_file_only: bool
If this flag is enabled, the file will be stored with Carbon, but no processing will be done.
🔄 Return
🌐 Endpoint
/uploadfile POST
carbon.files.uploadFromUrl
Create Upload File From Url
🛠️ Usage
$result = $carbon->files->uploadFromUrl( url: "string_example", file_name: "string_example", chunk_size: 1, chunk_overlap: 1, skip_embedding_generation: False, set_page_as_boundary: False, embedding_model: "OPENAI", generate_sparse_vectors: False, use_textract: False, prepend_filename_to_chunks: False, max_items_per_chunk: 1, parse_pdf_tables_with_ocr: False, detect_audio_language: False, transcription_service: "assemblyai", include_speaker_labels: False, media_type: "TEXT", split_rows: False, cold_storage_params: [ "enable_cold_storage" => False, ], generate_chunks_only: False, store_file_only: False );
⚙️ Parameters
url: string
file_name: string
chunk_size: int
chunk_overlap: int
skip_embedding_generation: bool
set_page_as_boundary: bool
embedding_model:
generate_sparse_vectors: bool
use_textract: bool
prepend_filename_to_chunks: bool
max_items_per_chunk: int
Number of objects per chunk. For csv, tsv, xlsx, and json files only.
parse_pdf_tables_with_ocr: bool
detect_audio_language: bool
transcription_service:
include_speaker_labels: bool
media_type:
split_rows: bool
cold_storage_params: ColdStorageProps
generate_chunks_only: bool
If this flag is enabled, the file will be chunked and stored with Carbon, but no embeddings will be generated. This overrides the skip_embedding_generation flag.
store_file_only: bool
If this flag is enabled, the file will be stored with Carbon, but no processing will be done.
🔄 Return
🌐 Endpoint
/upload_file_from_url POST
carbon.files.uploadText
Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI's
multimodal model; for text, we support OpenAI's text-embedding-ada-002 and Cohere's embed-multilingual-v3.0.
The model can be specified via the embedding_model parameter (in the POST body for /embeddings, and a query
parameter in /uploadfile). If no model is supplied, the text-embedding-ada-002 is used by default. When performing
embedding queries, embeddings from files that used the specified model will be considered in the query.
For example, if files A and B have embeddings generated with OPENAI, and files C and D have embeddings generated with
COHERE_MULTILINGUAL_V3, then by default, queries will only consider files A and B. If COHERE_MULTILINGUAL_V3 is
specified as the embedding_model in /embeddings, then only files C and D will be considered. Make sure that
the set of all files you want considered for a query have embeddings generated via the same model. For now, do not
set VERTEX_MULTIMODAL as an embedding_model. This model is used automatically by Carbon when it detects an image file.
🛠️ Usage
$result = $carbon->files->uploadText( contents: "aaaaa", name: "string_example", chunk_size: 1, chunk_overlap: 1, skip_embedding_generation: False, overwrite_file_id: 1, embedding_model: "OPENAI", generate_sparse_vectors: False, cold_storage_params: [ "enable_cold_storage" => False, ], generate_chunks_only: False, store_file_only: False );
⚙️ Parameters
contents: string
name: string
chunk_size: int
chunk_overlap: int
skip_embedding_generation: bool
overwrite_file_id: int
embedding_model:
generate_sparse_vectors: bool
cold_storage_params: ColdStorageProps
generate_chunks_only: bool
If this flag is enabled, the file will be chunked and stored with Carbon, but no embeddings will be generated. This overrides the skip_embedding_generation flag.
store_file_only: bool
If this flag is enabled, the file will be stored with Carbon, but no processing will be done.
🔄 Return
🌐 Endpoint
/upload_text POST
carbon.github.getIssue
Issue
🛠️ Usage
$result = $carbon->github->getIssue( issue_number: 1, include_remote_data: False, data_source_id: 1, repository: "string_example" );
⚙️ Parameters
issue_number: int
include_remote_data: bool
data_source_id: int
repository: string
🔄 Return
🌐 Endpoint
/integrations/data/github/issues/{issue_number} GET
carbon.github.getIssues
Issues
🛠️ Usage
$result = $carbon->github->getIssues( data_source_id: 1, repository: "string_example", include_remote_data: False, page: 1, page_size: 30, next_cursor: "string_example", filters: [ "state" => "closed", ], order_by: "created", order_dir: "asc" );
⚙️ Parameters
data_source_id: int
repository: string
Full name of the repository, denoted as {owner}/{repo}
include_remote_data: bool
page: int
page_size: int
next_cursor: string
filters: IssuesFilter
order_by:
order_dir:
🔄 Return
🌐 Endpoint
/integrations/data/github/issues POST
carbon.github.getPr
Get Pr
🛠️ Usage
$result = $carbon->github->getPr( pull_number: 1, include_remote_data: False, data_source_id: 1, repository: "string_example" );
⚙️ Parameters
pull_number: int
include_remote_data: bool
data_source_id: int
repository: string
🔄 Return
🌐 Endpoint
/integrations/data/github/pull_requests/{pull_number} GET
carbon.github.getPrComments
Pr Comments
🛠️ Usage
$result = $carbon->github->getPrComments( data_source_id: 1, repository: "string_example", pull_number: 1, include_remote_data: False, page: 1, page_size: 30, next_cursor: "string_example", order_by: "created", order_dir: "asc" );
⚙️ Parameters
data_source_id: int
repository: string
Full name of the repository, denoted as {owner}/{repo}
pull_number: int
include_remote_data: bool
page: int
page_size: int
next_cursor: string
order_by:
order_dir:
🔄 Return
🌐 Endpoint
/integrations/data/github/pull_requests/comments POST
carbon.github.getPrCommits
Pr Commits
🛠️ Usage
$result = $carbon->github->getPrCommits( data_source_id: 1, repository: "string_example", pull_number: 1, include_remote_data: False, page: 1, page_size: 30, next_cursor: "string_example" );
⚙️ Parameters
data_source_id: int
repository: string
Full name of the repository, denoted as {owner}/{repo}
pull_number: int
include_remote_data: bool
page: int
page_size: int
next_cursor: string
🔄 Return
🌐 Endpoint
/integrations/data/github/pull_requests/commits POST
carbon.github.getPrFiles
Pr Files
🛠️ Usage
$result = $carbon->github->getPrFiles( data_source_id: 1, repository: "string_example", pull_number: 1, include_remote_data: False, page: 1, page_size: 30, next_cursor: "string_example" );
⚙️ Parameters
data_source_id: int
repository: string
Full name of the repository, denoted as {owner}/{repo}
pull_number: int
include_remote_data: bool
page: int
page_size: int
next_cursor: string
🔄 Return
🌐 Endpoint
/integrations/data/github/pull_requests/files POST
carbon.github.getPullRequests
Get Prs
🛠️ Usage
$result = $carbon->github->getPullRequests( data_source_id: 1, repository: "string_example", include_remote_data: False, page: 1, page_size: 30, next_cursor: "string_example", filters: [ "state" => "closed", ], order_by: "created", order_dir: "asc" );
⚙️ Parameters
data_source_id: int
repository: string
Full name of the repository, denoted as {owner}/{repo}
include_remote_data: bool
page: int
page_size: int
next_cursor: string
filters: PullRequestFilters
order_by:
order_dir:
🔄 Return
🌐 Endpoint
/integrations/data/github/pull_requests POST
carbon.integrations.cancel
Cancel Data Source Items Sync
🛠️ Usage
$result = $carbon->integrations->cancel( data_source_id: 1 );
⚙️ Parameters
data_source_id: int
🔄 Return
🌐 Endpoint
/integrations/items/sync/cancel POST
carbon.integrations.connectDataSource
Connect Data Source
🛠️ Usage
$result = $carbon->integrations->connectDataSource( authentication: [ "source" => "GOOGLE_DRIVE", "access_token" => "access_token_example", ], sync_options: [ "chunk_size" => 1500, "chunk_overlap" => 20, "skip_embedding_generation" => False, "embedding_model" => "OPENAI", "generate_sparse_vectors" => False, "prepend_filename_to_chunks" => False, "sync_files_on_connection" => True, "set_page_as_boundary" => False, "enable_file_picker" => True, "sync_source_items" => True, "incremental_sync" => False, ] );
⚙️ Parameters
authentication: AuthenticationProperty
sync_options: SyncOptions
🔄 Return
🌐 Endpoint
/integrations/connect POST
carbon.integrations.connectDocument360
You will need an access token to connect your Document360 account. To obtain an access token, follow the steps highlighted here https://apidocs.document360.com/apidocs/api-token.
🛠️ Usage
$result = $carbon->integrations->connectDocument360( account_email: "string_example", access_token: "string_example", tags: [], chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: False, embedding_model: "OPENAI", generate_sparse_vectors: False, prepend_filename_to_chunks: False, sync_files_on_connection: True, request_id: "string_example", sync_source_items: True, file_sync_config: [ "auto_synced_source_types" => ["ARTICLE"], "sync_attachments" => False, "detect_audio_language" => False, "transcription_service" => "assemblyai", "include_speaker_labels" => False, "split_rows" => False, "generate_chunks_only" => False, "store_file_only" => False, "skip_file_processing" => False, "parsed_text_format" => "PLAIN_TEXT", ], data_source_tags: [] );
⚙️ Parameters
account_email: string
This email will be used to identify your carbon data source. It should have access to the Document360 account you wish to connect.
access_token: string
tags: object
chunk_size: int
chunk_overlap: int
skip_embedding_generation: bool
embedding_model:
generate_sparse_vectors: bool
prepend_filename_to_chunks: bool
sync_files_on_connection: bool
request_id: string
sync_source_items: bool
Enabling this flag will fetch all available content from the source to be listed via list items endpoint
file_sync_config: FileSyncConfigNullable
data_source_tags: object
Tags to be associated with the data source. If the data source already has tags set, then an upsert will be performed.
🔄 Return
🌐 Endpoint
/integrations/document360 POST
carbon.integrations.connectFreshdesk
Refer this article to obtain an API key https://support.freshdesk.com/en/support/solutions/articles/215517. Make sure that your API key has the permission to read solutions from your account and you are on a paid plan. Once you have an API key, you can make a request to this endpoint along with your freshdesk domain. This will trigger an automatic sync of the articles in your "solutions" tab. Additional parameters below can be used to associate data with the synced articles or modify the sync behavior.
🛠️ Usage
$result = $carbon->integrations->connectFreshdesk( domain: "string_example", api_key: "string_example", tags: [], chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: False, embedding_model: "OPENAI", generate_sparse_vectors: False, prepend_filename_to_chunks: False, sync_files_on_connection: True, request_id: "string_example", sync_source_items: True, file_sync_config: [ "auto_synced_source_types" => ["ARTICLE"], "sync_attachments" => False, "detect_audio_language" => False, "transcription_service" => "assemblyai", "include_speaker_labels" => False, "split_rows" => False, "generate_chunks_only" => False, "store_file_only" => False, "skip_file_processing" => False, "parsed_text_format" => "PLAIN_TEXT", ], data_source_tags: [] );
⚙️ Parameters
domain: string
api_key: string
tags: object
chunk_size: int
chunk_overlap: int
skip_embedding_generation: bool
embedding_model:
generate_sparse_vectors: bool
prepend_filename_to_chunks: bool
sync_files_on_connection: bool
request_id: string
sync_source_items: bool
Enabling this flag will fetch all available content from the source to be listed via list items endpoint
file_sync_config: FileSyncConfigNullable
data_source_tags: object
Tags to be associated with the data source. If the data source already has tags set, then an upsert will be performed.
🔄 Return
🌐 Endpoint
/integrations/freshdesk POST
carbon.integrations.connectGitbook
You will need an access token to connect your Gitbook account. Note that the permissions will be defined by the user generating access token so make sure you have the permission to access spaces you will be syncing. Refer this article for more details https://developer.gitbook.com/gitbook-api/authentication. Additionally, you need to specify the name of organization you will be syncing data from.
🛠️ Usage
$result = $carbon->integrations->connectGitbook( organization: "string_example", access_token: "string_example", tags: [], chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: False, embedding_model: "OPENAI", generate_sparse_vectors: False, prepend_filename_to_chunks: False, sync_files_on_connection: True, request_id: "string_example", sync_source_items: True, file_sync_config: [ "auto_synced_source_types" => ["ARTICLE"], "sync_attachments" => False, "detect_audio_language" => False, "transcription_service" => "assemblyai", "include_speaker_labels" => False, "split_rows" => False, "generate_chunks_only" => False, "store_file_only" => False, "skip_file_processing" => False, "parsed_text_format" => "PLAIN_TEXT", ], data_source_tags: [] );
⚙️ Parameters
organization: string
access_token: string
tags: object
chunk_size: int
chunk_overlap: int
skip_embedding_generation: bool
embedding_model:
generate_sparse_vectors: bool
prepend_filename_to_chunks: bool
sync_files_on_connection: bool
request_id: string
sync_source_items: bool
Enabling this flag will fetch all available content from the source to be listed via list items endpoint
file_sync_config: FileSyncConfigNullable
data_source_tags: object
Tags to be associated with the data source. If the data source already has tags set, then an upsert will be performed.
🔄 Return
🌐 Endpoint
/integrations/gitbook POST
carbon.integrations.connectGuru
You will need an access token to connect your Guru account. To obtain an access token, follow the steps highlighted here https://help.getguru.com/docs/gurus-api#obtaining-a-user-token. The username should be your Guru username.
🛠️ Usage
$result = $carbon->integrations->connectGuru( username: "string_example", access_token: "string_example", tags: [], chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: False, embedding_model: "OPENAI", generate_sparse_vectors: False, prepend_filename_to_chunks: False, sync_files_on_connection: True, request_id: "string_example", sync_source_items: True, file_sync_config: [ "auto_synced_source_types" => ["ARTICLE"], "sync_attachments" => False, "detect_audio_language" => False, "transcription_service" => "assemblyai", "include_speaker_labels" => False, "split_rows" => False, "generate_chunks_only" => False, "store_file_only" => False, "skip_file_processing" => False, "parsed_text_format" => "PLAIN_TEXT", ], data_source_tags: [] );
⚙️ Parameters
username: string
access_token: string
tags: object
chunk_size: int
chunk_overlap: int
skip_embedding_generation: bool
embedding_model:
generate_sparse_vectors: bool
prepend_filename_to_chunks: bool
sync_files_on_connection: bool
request_id: string
sync_source_items: bool
Enabling this flag will fetch all available content from the source to be listed via list items endpoint
file_sync_config: FileSyncConfigNullable
data_source_tags: object
Tags to be associated with the data source. If the data source already has tags set, then an upsert will be performed.
🔄 Return
🌐 Endpoint
/integrations/guru POST
carbon.integrations.createAwsIamUser
This endpoint can be used to connect S3 as well as Digital Ocean Spaces (S3 compatible)
For S3, create a new IAM user with permissions to:
- List all buckets.
- Read from the specific buckets and objects to sync with Carbon. Ensure any future buckets or objects carry the same permissions.
🛠️ Usage
$result = $carbon->integrations->createAwsIamUser( access_key: "string_example", access_key_secret: "string_example", sync_source_items: True, endpoint_url: "string_example", data_source_tags: [] );
⚙️ Parameters
access_key: string
access_key_secret: string
sync_source_items: bool
Enabling this flag will fetch all available content from the source to be listed via list items endpoint
endpoint_url: string
You can specify a Digital Ocean endpoint URL to connect a Digital Ocean Space through this endpoint. The URL should be of format .digitaloceanspaces.com. It's not required for S3 buckets.
data_source_tags: object
Tags to be associated with the data source. If the data source already has tags set, then an upsert will be performed.
🔄 Return
🌐 Endpoint
/integrations/s3 POST
carbon.integrations.getOauthUrl
This endpoint can be used to generate the following URLs
- An OAuth URL for OAuth based connectors
- A file syncing URL which skips the OAuth flow if the user already has a valid access token and takes them to the success state.
🛠️ Usage
$result = $carbon->integrations->getOauthUrl( service: "BOX", tags: None, scope: "string_example", scopes: [], chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: False, embedding_model: "OPENAI", zendesk_subdomain: "string_example", microsoft_tenant: "string_example", sharepoint_site_name: "string_example", confluence_subdomain: "string_example", generate_sparse_vectors: False, prepend_filename_to_chunks: False, max_items_per_chunk: 1, salesforce_domain: "string_example", sync_files_on_connection: True, set_page_as_boundary: False, data_source_id: 1, connecting_new_account: False, request_id: "string_example", use_ocr: False, parse_pdf_tables_with_ocr: False, enable_file_picker: True, sync_source_items: True, incremental_sync: False, file_sync_config: [ "auto_synced_source_types" => ["ARTICLE"], "sync_attachments" => False, "detect_audio_language" => False, "transcription_service" => "assemblyai", "include_speaker_labels" => False, "split_rows" => False, "generate_chunks_only" => False, "store_file_only" => False, "skip_file_processing" => False, "parsed_text_format" => "PLAIN_TEXT", ], automatically_open_file_picker: True, gong_account_email: "string_example", servicenow_credentials: [ "instance_subdomain" => "instance_subdomain_example", "client_id" => "client_id_example", "client_secret" => "client_secret_example", "redirect_uri" => "redirect_uri_example", ], data_source_tags: [] );
⚙️ Parameters
service:
tags:
scope: string
scopes: string[]
List of scopes to request from the OAuth provider. Please that the scopes will be used as it is, not combined with the default props that Carbon uses. One scope should be one array element.
chunk_size: int
chunk_overlap: int
skip_embedding_generation: bool
embedding_model:
zendesk_subdomain: string
microsoft_tenant: string
sharepoint_site_name: string
confluence_subdomain: string
generate_sparse_vectors: bool
prepend_filename_to_chunks: bool
max_items_per_chunk: int
Number of objects per chunk. For csv, tsv, xlsx, and json files only.
salesforce_domain: string
sync_files_on_connection: bool
Used to specify whether Carbon should attempt to sync all your files automatically when authorization is complete. This is only supported for a subset of connectors and will be ignored for the rest. Supported connectors: Intercom, Zendesk, Gitbook, Confluence, Salesforce, Freshdesk
set_page_as_boundary: bool
data_source_id: int
Used to specify a data source to sync from if you have multiple connected. It can be skipped if you only have one data source of that type connected or are connecting a new account.
connecting_new_account: bool
Used to connect a new data source. If not specified, we will attempt to create a sync URL for an existing data source based on type and ID.
request_id: string
This request id will be added to all files that get synced using the generated OAuth URL
use_ocr: bool
Enable OCR for files that support it. Supported formats: pdf, png, jpg
parse_pdf_tables_with_ocr: bool
enable_file_picker: bool
Enable integration's file picker for sources that support it. Supported sources: BOX, DROPBOX, GOOGLE_DRIVE, ONEDRIVE, SHAREPOINT
sync_source_items: bool
Enabling this flag will fetch all available content from the source to be listed via list items endpoint
incremental_sync: bool
Only sync files if they have not already been synced or if the embedding properties have changed. This flag is currently supported by ONEDRIVE, GOOGLE_DRIVE, BOX, DROPBOX, INTERCOM, GMAIL, OUTLOOK, ZENDESK, CONFLUENCE, NOTION, SHAREPOINT, SERVICENOW. It will be ignored for other data sources.
file_sync_config: FileSyncConfigNullable
automatically_open_file_picker: bool
Automatically open source file picker after the OAuth flow is complete. This flag is currently supported by BOX, DROPBOX, GOOGLE_DRIVE, ONEDRIVE, SHAREPOINT. It will be ignored for other data sources.
gong_account_email: string
If you are connecting a Gong account, you need to input the email of the account you wish to connect. This email will be used to identify your carbon data source.
servicenow_credentials: ServiceNowCredentialsNullable
data_source_tags: object
Tags to be associated with the data source. If the data source already has tags set, then an upsert will be performed.
🔄 Return
🌐 Endpoint
/integrations/oauth_url POST
carbon.integrations.listConfluencePages
This endpoint has been deprecated. Use /integrations/items/list instead.
To begin listing a user's Confluence pages, at least a data_source_id of a connected
Confluence account must be specified. This base request returns a list of root pages for
every space the user has access to in a Confluence instance. To traverse further down
the user's page directory, additional requests to this endpoint can be made with the same
data_source_id and with parent_id set to the id of page from a previous request. For
convenience, the has_children property in each directory item in the response list will
flag which pages will return non-empty lists of pages when set as the parent_id.
🛠️ Usage
$result = $carbon->integrations->listConfluencePages( data_source_id: 1, parent_id: "string_example" );
⚙️ Parameters
data_source_id: int
parent_id: string
🔄 Return
🌐 Endpoint
/integrations/confluence/list POST
carbon.integrations.listConversations
List all of your public and private channels, DMs, and Group DMs. The ID from response
can be used as a filter to sync messages to Carbon
types: Comma separated list of types. Available types are im (DMs), mpim (group DMs), public_channel, and private_channel.
Defaults to public_channel.
cursor: Used for pagination. If next_cursor is returned in response, you need to pass it as the cursor in the next request
data_source_id: Data source needs to be specified if you have linked multiple slack accounts
exclude_archived: Should archived conversations be excluded, defaults to true
🛠️ Usage
$result = $carbon->integrations->listConversations( types: "public_channel", cursor: "string_example", data_source_id: 1, exclude_archived: True );
⚙️ Parameters
types: string
cursor: string
data_source_id: int
exclude_archived: bool
🔄 Return
object
🌐 Endpoint
/integrations/slack/conversations GET
carbon.integrations.listDataSourceItems
List Data Source Items
🛠️ Usage
$result = $carbon->integrations->listDataSourceItems( data_source_id: 1, parent_id: "string_example", filters: [ ], pagination: [ "limit" => 10, "offset" => 0, "starting_id" => 0, ], order_by: "name", order_dir: "asc" );
⚙️ Parameters
data_source_id: int
parent_id: string
filters: ListItemsFiltersNullable
pagination: Pagination
order_by:
order_dir:
🔄 Return
🌐 Endpoint
/integrations/items/list POST
carbon.integrations.listFolders
After connecting your Outlook account, you can use this endpoint to list all of your folders on outlook. This includes both system folders like "inbox" and user created folders.
🛠️ Usage
$result = $carbon->integrations->listFolders( data_source_id: 1 );
⚙️ Parameters
data_source_id: int
🔄 Return
object
🌐 Endpoint
/integrations/outlook/user_folders GET
carbon.integrations.listGitbookSpaces
After connecting your Gitbook account, you can use this endpoint to list all of your spaces under current organization.
🛠️ Usage
$result = $carbon->integrations->listGitbookSpaces( data_source_id: 1 );
⚙️ Parameters
data_source_id: int
🔄 Return
object
🌐 Endpoint
/integrations/gitbook/spaces GET
carbon.integrations.listLabels
After connecting your Gmail account, you can use this endpoint to list all of your labels. User created labels will have the type "user" and Gmail's default labels will have the type "system"
🛠️ Usage
$result = $carbon->integrations->listLabels( data_source_id: 1 );
⚙️ Parameters
data_source_id: int
🔄 Return
object
🌐 Endpoint
/integrations/gmail/user_labels GET
carbon.integrations.listOutlookCategories
After connecting your Outlook account, you can use this endpoint to list all of your categories on outlook. We currently support listing up to 250 categories.
🛠️ Usage
$result = $carbon->integrations->listOutlookCategories( data_source_id: 1 );
⚙️ Parameters
data_source_id: int
🔄 Return
object
🌐 Endpoint
/integrations/outlook/user_categories GET
carbon.integrations.listRepos
Once you have connected your GitHub account, you can use this endpoint to list the repositories your account has access to. You can use a data source ID or username to fetch from a specific account.
🛠️ Usage
$result = $carbon->integrations->listRepos( per_page: 30, page: 1, data_source_id: 1 );
⚙️ Parameters
per_page: int
page: int
data_source_id: int
🔄 Return
object
🌐 Endpoint
/integrations/github/repos GET
carbon.integrations.listSharepointSites
List all Sharepoint sites in the connected tenant. The site names from the response can be used as the site name when connecting a Sharepoint site. If site name is null in the response, then site name should be left null when connecting to the site.
This endpoint requires an additional Sharepoint scope: "Sites.Read.All". Include this scope along with the default Sharepoint scopes to list Sharepoint sites, connect to a site, and finally sync files from the site. The default Sharepoint scopes are: [o, p, e, n, i, d, , o, f, f, l, i, n, e, _, a, c, c, e, s, s, , U, s, e, r, ., R, e, a, d, , F, i, l, e, s, ., R, e, a, d, ., A, l, l].
data_soure_id: Data source needs to be specified if you have linked multiple Sharepoint accounts cursor: Used for pagination. If next_cursor is returned in response, you need to pass it as the cursor in the next request
🛠️ Usage
$result = $carbon->integrations->listSharepointSites( data_source_id: 1, cursor: "string_example" );
⚙️ Parameters
data_source_id: int
cursor: string
🔄 Return
object
🌐 Endpoint
/integrations/sharepoint/sites/list GET
carbon.integrations.syncAzureBlobFiles
After optionally loading the items via /integrations/items/sync and integrations/items/list, use the container name and file name as the ID in this endpoint to sync them into Carbon. Additional parameters below can associate data with the selected items or modify the sync behavior
🛠️ Usage
$result = $carbon->integrations->syncAzureBlobFiles( ids: [ [ ] ], tags: [], chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: False, embedding_model: "OPENAI", generate_sparse_vectors: False, prepend_filename_to_chunks: False, max_items_per_chunk: 1, set_page_as_boundary: False, data_source_id: 1, request_id: "string_example", use_ocr: False, parse_pdf_tables_with_ocr: False, file_sync_config: [ "auto_synced_source_types" => ["ARTICLE"], "sync_attachments" => False, "detect_audio_language" => False, "transcription_service" => "assemblyai", "include_speaker_labels" => False, "split_rows" => False, "generate_chunks_only" => False, "store_file_only" => False, "skip_file_processing" => False, "parsed_text_format" => "PLAIN_TEXT", ] );
⚙️ Parameters
ids: AzureBlobGetFileInput[]
tags: object
chunk_size: int
chunk_overlap: int
skip_embedding_generation: bool
embedding_model:
generate_sparse_vectors: bool
prepend_filename_to_chunks: bool
max_items_per_chunk: int
Number of objects per chunk. For csv, tsv, xlsx, and json files only.
set_page_as_boundary: bool
data_source_id: int
request_id: string
use_ocr: bool
parse_pdf_tables_with_ocr: bool
file_sync_config: FileSyncConfigNullable
🔄 Return
🌐 Endpoint
/integrations/azure_blob_storage/files POST
carbon.integrations.syncAzureBlobStorage
This endpoint can be used to connect Azure Blob Storage.
For Azure Blob Storage, follow these steps:
- Create a new Azure Storage account and grant the following permissions:
- List containers.
- Read from specific containers and blobs to sync with Carbon. Ensure any future containers or blobs carry the same permissions.
- Generate a shared access signature (SAS) token or an access key for the storage account.
Once created, provide us with the following details to generate the connection URL:
- Storage Account KeyName.
- Storage Account Name.
🛠️ Usage
$result = $carbon->integrations->syncAzureBlobStorage( account_name: "string_example", account_key: "string_example", sync_source_items: True, data_source_tags: [] );
⚙️ Parameters
account_name: string
account_key: string
sync_source_items: bool
data_source_tags: object
Tags to be associated with the data source. If the data source already has tags set, then an upsert will be performed.
🔄 Return
🌐 Endpoint
/integrations/azure_blob_storage POST
carbon.integrations.syncConfluence
This endpoint has been deprecated. Use /integrations/files/sync instead.
After listing pages in a user's Confluence account, the set of selected page ids and the
connected account's data_source_id can be passed into this endpoint to sync them into
Carbon. Additional parameters listed below can be used to associate data to the selected
pages or alter the behavior of the sync.
🛠️ Usage
$result = $carbon->integrations->syncConfluence( data_source_id: 1, ids: [ "string_example" ], tags: [], chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: False, embedding_model: "OPENAI", generate_sparse_vectors: False, prepend_filename_to_chunks: False, max_items_per_chunk: 1, set_page_as_boundary: False, request_id: "string_example", use_ocr: False, parse_pdf_tables_with_ocr: False, incremental_sync: False, file_sync_config: [ "auto_synced_source_types" => ["ARTICLE"], "sync_attachments" => False, "detect_audio_language" => False, "transcription_service" => "assemblyai", "include_speaker_labels" => False, "split_rows" => False, "generate_chunks_only" => False, "store_file_only" => False, "skip_file_processing" => False, "parsed_text_format" => "PLAIN_TEXT", ] );
⚙️ Parameters
data_source_id: int
ids: IdsProperty
tags: object
chunk_size: int
chunk_overlap: int
skip_embedding_generation: bool
embedding_model:
generate_sparse_vectors: bool
prepend_filename_to_chunks: bool
max_items_per_chunk: int
Number of objects per chunk. For csv, tsv, xlsx, and json files only.
set_page_as_boundary: bool
request_id: string
use_ocr: bool
parse_pdf_tables_with_ocr: bool
incremental_sync: bool
Only sync files if they have not already been synced or if the embedding properties have changed. This flag is currently supported by ONEDRIVE, GOOGLE_DRIVE, BOX, DROPBOX, INTERCOM, GMAIL, OUTLOOK, ZENDESK, CONFLUENCE, NOTION, SHAREPOINT, SERVICENOW. It will be ignored for other data sources.
file_sync_config: FileSyncConfigNullable
🔄 Return
🌐 Endpoint
/integrations/confluence/sync POST
carbon.integrations.syncDataSourceItems
Sync Data Source Items
🛠️ Usage
$result = $carbon->integrations->syncDataSourceItems( data_source_id: 1 );
⚙️ Parameters
data_source_id: int
🔄 Return
🌐 Endpoint
/integrations/items/sync POST
carbon.integrations.syncFiles
After listing files and folders via /integrations/items/sync and integrations/items/list, use the selected items' external ids as the ids in this endpoint to sync them into Carbon. Sharepoint items take an additional parameter root_id, which identifies the drive the file or folder is in and is stored in root_external_id. That additional paramter is optional and excluding it will tell the sync to assume the item is stored in the default Documents drive.
🛠️ Usage
$result = $carbon->integrations->syncFiles( data_source_id: 1, ids: [ "string_example" ], tags: [], chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: False, embedding_model: "OPENAI", generate_sparse_vectors: False, prepend_filename_to_chunks: False, max_items_per_chunk: 1, set_page_as_boundary: False, request_id: "string_example", use_ocr: False, parse_pdf_tables_with_ocr: False, incremental_sync: False, file_sync_config: [ "auto_synced_source_types" => ["ARTICLE"], "sync_attachments" => False, "detect_audio_language" => False, "transcription_service" => "assemblyai", "include_speaker_labels" => False, "split_rows" => False, "generate_chunks_only" => False, "store_file_only" => False, "skip_file_processing" => False, "parsed_text_format" => "PLAIN_TEXT", ] );
⚙️ Parameters
data_source_id: int
ids: IdsProperty
tags: object
chunk_size: int
chunk_overlap: int
skip_embedding_generation: bool
embedding_model:
generate_sparse_vectors: bool
prepend_filename_to_chunks: bool
max_items_per_chunk: int
Number of objects per chunk. For csv, tsv, xlsx, and json files only.
set_page_as_boundary: bool
request_id: string
use_ocr: bool
parse_pdf_tables_with_ocr: bool
incremental_sync: bool
Only sync files if they have not already been synced or if the embedding properties have changed. This flag is currently supported by ONEDRIVE, GOOGLE_DRIVE, BOX, DROPBOX, INTERCOM, GMAIL, OUTLOOK, ZENDESK, CONFLUENCE, NOTION, SHAREPOINT, SERVICENOW. It will be ignored for other data sources.
file_sync_config: FileSyncConfigNullable
🔄 Return
🌐 Endpoint
/integrations/files/sync POST
carbon.integrations.syncGitHub
Refer this article to obtain an access token https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens. Make sure that your access token has the permission to read content from your desired repos. Note that if your access token expires you will need to manually update it through this endpoint.
🛠️ Usage
$result = $carbon->integrations->syncGitHub( username: "string_example", access_token: "string_example", sync_source_items: False, data_source_tags: [] );
⚙️ Parameters
username: string
access_token: string
sync_source_items: bool
Enabling this flag will fetch all available content from the source to be listed via list items endpoint
data_source_tags: object
Tags to be associated with the data source. If the data source already has tags set, then an upsert will be performed.
🔄 Return
🌐 Endpoint
/integrations/github POST
carbon.integrations.syncGitbook
You can sync upto 20 Gitbook spaces at a time using this endpoint. Additional parameters below can be used to associate data with the synced pages or modify the sync behavior.
🛠️ Usage
$result = $carbon->integrations->syncGitbook( space_ids: [ "string_example" ], data_source_id: 1, tags: [], chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: False, embedding_model: "OPENAI", generate_sparse_vectors: False, prepend_filename_to_chunks: False, request_id: "string_example", file_sync_config: [ "auto_synced_source_types" => ["ARTICLE"], "sync_attachments" => False, "detect_audio_language" => False, "transcription_service" => "assemblyai", "include_speaker_labels" => False, "split_rows" => False, "generate_chunks_only" => False, "store_file_only" => False, "skip_file_processing" => False, "parsed_text_format" => "PLAIN_TEXT", ] );
⚙️ Parameters
space_ids: string[]
data_source_id: int
tags: object
chunk_size: int
chunk_overlap: int
skip_embedding_generation: bool
embedding_model:
generate_sparse_vectors: bool
prepend_filename_to_chunks: bool
request_id: string
file_sync_config: FileSyncConfigNullable
🔄 Return
object
🌐 Endpoint
/integrations/gitbook/sync POST
carbon.integrations.syncGmail
Once you have successfully connected your gmail account, you can choose which emails to sync with us using the filters parameter. Filters is a JSON object with key value pairs. It also supports AND and OR operations. For now, we support a limited set of keys listed below.
label: Inbuilt Gmail labels, for example "Important" or a custom label you created.
after or before: A date in YYYY/mm/dd format (example 2023/12/31). Gets emails after/before a certain date.
You can also use them in combination to get emails from a certain period.
is: Can have the following values - starred, important, snoozed, and unread
from: Email address of the sender
to: Email address of the recipient
in: Can have the following values - sent (sync emails sent by the user)
has: Can have the following values - attachment (sync emails that have attachments)
Using keys or values outside of the specified values can lead to unexpected behaviour.
An example of a basic query with filters can be
{
"filters": {
"key": "label",
"value": "Test"
}
}
Which will list all emails that have the label "Test".
You can use AND and OR operation in the following way:
{
"filters": {
"AND": [
{
"key": "after",
"value": "2024/01/07"
},
{
"OR": [
{
"key": "label",
"value": "Personal"
},
{
"key": "is",
"value": "starred"
}
]
}
]
}
}
This will return emails after 7th of Jan that are either starred or have the label "Personal". Note that this is the highest level of nesting we support, i.e. you can't add more AND/OR filters within the OR filter in the above example.
🛠️ Usage
$result = $carbon->integrations->syncGmail( filters: [], tags: [], chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: False, embedding_model: "OPENAI", generate_sparse_vectors: False, prepend_filename_to_chunks: False, data_source_id: 1, request_id: "string_example", sync_attachments: False, file_sync_config: [ "auto_synced_source_types" => ["ARTICLE"], "sync_attachments" => False, "detect_audio_language" => False, "transcription_service" => "assemblyai", "include_speaker_labels" => False, "split_rows" => False, "generate_chunks_only" => False, "store_file_only" => False, "skip_file_processing" => False, "parsed_text_format" => "PLAIN_TEXT", ], incremental_sync: False );
⚙️ Parameters
filters: object
tags: object
chunk_size: int
chunk_overlap: int
skip_embedding_generation: bool
embedding_model:
generate_sparse_vectors: bool
prepend_filename_to_chunks: bool
data_source_id: int
request_id: string
sync_attachments: bool
file_sync_config: FileSyncConfigNullable
incremental_sync: bool
🔄 Return
🌐 Endpoint
/integrations/gmail/sync POST
carbon.integrations.syncOutlook
Once you have successfully connected your Outlook account, you can choose which emails to sync with us
using the filters and folder parameter. "folder" should be the folder you want to sync from Outlook. By default
we get messages from your inbox folder.
Filters is a JSON object with key value pairs. It also supports AND and OR operations.
For now, we support a limited set of keys listed below.
category: Custom categories that you created in Outlook.
after or before: A date in YYYY/mm/dd format (example 2023/12/31). Gets emails after/before a certain date. You can also use them in combination to get emails from a certain period.
is: Can have the following values: flagged
from: Email address of the sender
An example of a basic query with filters can be
{
"filters": {
"key": "category",
"value": "Test"
}
}
Which will list all emails that have the category "Test".
Specifying a custom folder in the same query
{
"folder": "Folder Name",
"filters": {
"key": "category",
"value": "Test"
}
}
You can use AND and OR operation in the following way:
{
"filters": {
"AND": [
{
"key": "after",
"value": "2024/01/07"
},
{
"OR": [
{
"key": "category",
"value": "Personal"
},
{
"key": "category",
"value": "Test"
},
]
}
]
}
}
This will return emails after 7th of Jan that have either Personal or Test as category. Note that this is the highest level of nesting we support, i.e. you can't add more AND/OR filters within the OR filter in the above example.
🛠️ Usage
$result = $carbon->integrations->syncOutlook( filters: [], tags: [], folder: "Inbox", chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: False, embedding_model: "OPENAI", generate_sparse_vectors: False, prepend_filename_to_chunks: False, data_source_id: 1, request_id: "string_example", sync_attachments: False, file_sync_config: [ "auto_synced_source_types" => ["ARTICLE"], "sync_attachments" => False, "detect_audio_language" => False, "transcription_service" => "assemblyai", "include_speaker_labels" => False, "split_rows" => False, "generate_chunks_only" => False, "store_file_only" => False, "skip_file_processing" => False, "parsed_text_format" => "PLAIN_TEXT", ], incremental_sync: False );
⚙️ Parameters
filters: object
tags: object
folder: string
chunk_size: int
chunk_overlap: int
skip_embedding_generation: bool
embedding_model:
generate_sparse_vectors: bool
prepend_filename_to_chunks: bool
data_source_id: int
request_id: string
sync_attachments: bool
file_sync_config: FileSyncConfigNullable
incremental_sync: bool
🔄 Return
🌐 Endpoint
/integrations/outlook/sync POST
carbon.integrations.syncRepos
You can retreive repos your token has access to using /integrations/github/repos and sync their content. You can also pass full name of any public repository (username/repo-name). This will store the repo content with carbon which can be accessed through /integrations/items/list endpoint. Maximum of 25 repositories are accepted per request.
🛠️ Usage
$result = $carbon->integrations->syncRepos( repos: [ "string_example" ], data_source_id: 1 );
⚙️ Parameters
repos: string[]
data_source_id: int
🔄 Return
object
🌐 Endpoint
/integrations/github/sync_repos POST
carbon.integrations.syncRssFeed
Rss Feed
🛠️ Usage
$result = $carbon->integrations->syncRssFeed( url: "string_example", tags: [], chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: False, embedding_model: "OPENAI", generate_sparse_vectors: False, prepend_filename_to_chunks: False, request_id: "string_example", data_source_tags: [] );
⚙️ Parameters
url: string
tags: object
chunk_size: int
chunk_overlap: int
skip_embedding_generation: bool
embedding_model:
generate_sparse_vectors: bool
prepend_filename_to_chunks: bool
request_id: string
data_source_tags: object
Tags to be associated with the data source. If the data source already has tags set, then an upsert will be performed.
🔄 Return
🌐 Endpoint
/integrations/rss_feed POST
carbon.integrations.syncS3Files
After optionally loading the items via /integrations/items/sync and integrations/items/list, use the bucket name and object key as the ID in this endpoint to sync them into Carbon. Additional parameters below can associate data with the selected items or modify the sync behavior
🛠️ Usage
$result = $carbon->integrations->syncS3Files( ids: [ [ ] ], tags: [], chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: False, embedding_model: "OPENAI", generate_sparse_vectors: False, prepend_filename_to_chunks: False, max_items_per_chunk: 1, set_page_as_boundary: False, data_source_id: 1, request_id: "string_example", use_ocr: False, parse_pdf_tables_with_ocr: False, file_sync_config: [ "auto_synced_source_types" => ["ARTICLE"], "sync_attachments" => False, "detect_audio_language" => False, "transcription_service" => "assemblyai", "include_speaker_labels" => False, "split_rows" => False, "generate_chunks_only" => False, "store_file_only" => False, "skip_file_processing" => False, "parsed_text_format" => "PLAIN_TEXT", ] );
⚙️ Parameters
ids: S3GetFileInput[]
Each input should be one of the following: A bucket name, a bucket name and a prefix, or a bucket name and an object key. A prefix is the common path for all objects you want to sync. Paths should end with a forward slash.
tags: object
chunk_size: int
chunk_overlap: int
skip_embedding_generation: bool
embedding_model:
generate_sparse_vectors: bool
prepend_filename_to_chunks: bool
max_items_per_chunk: int
Number of objects per chunk. For csv, tsv, xlsx, and json files only.
set_page_as_boundary: bool
data_source_id: int
request_id: string
use_ocr: bool
parse_pdf_tables_with_ocr: bool
file_sync_config: FileSyncConfigNullable
🔄 Return
🌐 Endpoint
/integrations/s3/files POST
carbon.integrations.syncSlack
You can list all conversations using the endpoint /integrations/slack/conversations. The ID of conversation will be used as an input for this endpoint with timestamps as optional filters.
🛠️ Usage
$result = $carbon->integrations->syncSlack( filters: [ "conversation_id" => "conversation_id_example", ], tags: [], chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: False, embedding_model: "OPENAI", generate_sparse_vectors: False, prepend_filename_to_chunks: False, data_source_id: 1, request_id: "string_example" );
⚙️ Parameters
filters: SlackFilters
tags: object
chunk_size: int
chunk_overlap: int
skip_embedding_generation: bool
embedding_model:
generate_sparse_vectors: bool
prepend_filename_to_chunks: bool
data_source_id: int
request_id: string
🔄 Return
object
🌐 Endpoint
/integrations/slack/sync POST
carbon.organizations.get
Get Organization
🛠️ Usage
$result = $carbon->organizations->get();
🔄 Return
🌐 Endpoint
/organization GET
carbon.organizations.update
Update Organization
🛠️ Usage
$result = $carbon->organizations->update( global_user_config: [ ], data_source_configs: [ "key": [ "allowed_file_formats" => [], ], ] );
⚙️ Parameters
global_user_config: UserConfigurationNullable
data_source_configs: array<string, DataSourceConfiguration>
Used to set organization level defaults for configuration related to data sources.
🔄 Return
🌐 Endpoint
/organization/update POST
carbon.organizations.updateStats
Use this endpoint to reaggregate the statistics for an organization, for example aggregate_file_size. The reaggregation process is asyncronous so a webhook will be sent with the event type being FILE_STATISTICS_AGGREGATED to notify when the process is complee. After this aggregation is complete, the updated statistics can be retrieved using the /organization endpoint. The response of /organization willalso contain a timestamp of the last time the statistics were reaggregated.
🛠️ Usage
$result = $carbon->organizations->updateStats();
🔄 Return
🌐 Endpoint
/organization/statistics POST
carbon.users.all
List users within an organization
🛠️ Usage
$result = $carbon->users->all( pagination: [ "limit" => 10, "offset" => 0, "starting_id" => 0, ], filters: [ ], order_by: "created_at", order_dir: "asc", include_count: False );
⚙️ Parameters
pagination: Pagination
filters: ListUsersFilters
order_by:
order_dir:
include_count: bool
🔄 Return
🌐 Endpoint
/list_users POST
carbon.users.delete
Delete Users
🛠️ Usage
$result = $carbon->users->delete( customer_ids: [ "string_example" ] );
⚙️ Parameters
customer_ids: string[]
🔄 Return
🌐 Endpoint
/delete_users POST
carbon.users.get
User Endpoint
🛠️ Usage
$result = $carbon->users->get( customer_id: "string_example" );
⚙️ Parameters
customer_id: string
🔄 Return
🌐 Endpoint
/user POST
carbon.users.toggleUserFeatures
Toggle User Features
🛠️ Usage
$result = $carbon->users->toggleUserFeatures( configuration_key_name: "sparse_vectors", value: [] );
⚙️ Parameters
configuration_key_name:
value: object
🔄 Return
🌐 Endpoint
/modify_user_configuration POST
carbon.users.updateUsers
Update Users
🛠️ Usage
$result = $carbon->users->updateUsers( customer_ids: [ "string_example" ], auto_sync_enabled_sources: [ "string_example" ], max_files: -1, max_files_per_upload: -1, max_characters: -1, max_characters_per_file: -1, max_characters_per_upload: -1, auto_sync_interval: -1 );
⚙️ Parameters
customer_ids: string[]
List of organization supplied user IDs
auto_sync_enabled_sources: AutoSyncEnabledSourcesProperty
max_files: int
Custom file upload limit for the user over all user's files across all uploads. If set, then the user will not be allowed to upload more files than this limit. If not set, or if set to -1, then the user will have no limit.
max_files_per_upload: int
Custom file upload limit for the user across a single upload. If set, then the user will not be allowed to upload more files than this limit in a single upload. If not set, or if set to -1, then the user will have no limit.
max_characters: int
Custom character upload limit for the user over all user's files across all uploads. If set, then the user will not be allowed to upload more characters than this limit. If not set, or if set to -1, then the user will have no limit.
max_characters_per_file: int
A single file upload from the user can not exceed this character limit. If set, then the file will not be synced if it exceeds this limit. If not set, or if set to -1, then the user will have no limit.
max_characters_per_upload: int
Custom character upload limit for the user across a single upload. If set, then the user won't be able to sync more than this many characters in one upload. If not set, or if set to -1, then the user will have no limit.
auto_sync_interval: int
The interval in hours at which the user's data sources should be synced. If not set or set to -1, the user will be synced at the organization level interval or default interval if that is also not set. Must be one of [3, 6, 12, 24]
🔄 Return
🌐 Endpoint
/update_users POST
carbon.users.whoAmI
Me Endpoint
🛠️ Usage
$result = $carbon->users->whoAmI();
🔄 Return
🌐 Endpoint
/whoami GET
carbon.utilities.fetchUrls
Extracts all URLs from a webpage.
Args: url (str): URL of the webpage
Returns: FetchURLsResponse: A response object with a list of URLs extracted from the webpage and the webpage content.
🛠️ Usage
$result = $carbon->utilities->fetchUrls( url: "url_example" );
⚙️ Parameters
url: string
🔄 Return
🌐 Endpoint
/fetch_urls GET
carbon.utilities.fetchWebpage
Fetch Urls V2
🛠️ Usage
$result = $carbon->utilities->fetchWebpage( url: "string_example" );
⚙️ Parameters
url: string
🔄 Return
object
🌐 Endpoint
/fetch_webpage POST
carbon.utilities.fetchYoutubeTranscripts
Fetches english transcripts from YouTube videos.
Args: id (str): The ID of the YouTube video. raw (bool): Whether to return the raw transcript or not. Defaults to False.
Returns: dict: A dictionary with the transcript of the YouTube video.
🛠️ Usage
$result = $carbon->utilities->fetchYoutubeTranscripts( id: "id_example", raw: False );
⚙️ Parameters
id: string
raw: bool
🔄 Return
🌐 Endpoint
/fetch_youtube_transcript GET
carbon.utilities.processSitemap
Retrieves all URLs from a sitemap, which can subsequently be utilized with our web_scrape endpoint.
🛠️ Usage
$result = $carbon->utilities->processSitemap( url: "url_example" );
⚙️ Parameters
url: string
🔄 Return
object
🌐 Endpoint
/process_sitemap GET
carbon.utilities.scrapeSitemap
Extracts all URLs from a sitemap and performs a web scrape on each of them.
Args: sitemap_url (str): URL of the sitemap
Returns: dict: A response object with the status of the scraping job message.-->
🛠️ Usage
$result = $carbon->utilities->scrapeSitemap( url: "string_example", tags: [ "key": "string_example", ], max_pages_to_scrape: 1, chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: False, enable_auto_sync: False, generate_sparse_vectors: False, prepend_filename_to_chunks: False, html_tags_to_skip: [], css_classes_to_skip: [], css_selectors_to_skip: [], embedding_model: "OPENAI", url_paths_to_include: [], url_paths_to_exclude: [], urls_to_scrape: [], download_css_and_media: False, generate_chunks_only: False, store_file_only: False, use_premium_proxies: False );
⚙️ Parameters
url: string
tags: array<string, Tags1>
max_pages_to_scrape: int
chunk_size: int
chunk_overlap: int
skip_embedding_generation: bool
enable_auto_sync: bool
generate_sparse_vectors: bool
prepend_filename_to_chunks: bool
html_tags_to_skip: string[]
css_classes_to_skip: string[]
css_selectors_to_skip: string[]
embedding_model:
url_paths_to_include: string[]
URL subpaths or directories that you want to include. For example if you want to only include URLs that start with /questions in stackoverflow.com, you will add /questions/ in this input
url_paths_to_exclude: string[]
URL subpaths or directories that you want to exclude. For example if you want to exclude URLs that start with /questions in stackoverflow.com, you will add /questions/ in this input
urls_to_scrape: string[]
You can submit a subset of URLs from the sitemap that should be scraped. To get the list of URLs, you can check out /process_sitemap endpoint. If left empty, all URLs from the sitemap will be scraped.
download_css_and_media: bool
Whether the scraper should download css and media from the page (images, fonts, etc). Scrapes might take longer to finish with this flag enabled, but the success rate is improved.
generate_chunks_only: bool
If this flag is enabled, the file will be chunked and stored with Carbon, but no embeddings will be generated. This overrides the skip_embedding_generation flag.
store_file_only: bool
If this flag is enabled, the file will be stored with Carbon, but no processing will be done.
use_premium_proxies: bool
If the default proxies are blocked and not returning results, this flag can be enabled to use alternate proxies (residential and office). Scrapes might take longer to finish with this flag enabled.
🔄 Return
object
🌐 Endpoint
/scrape_sitemap POST
carbon.utilities.scrapeWeb
Conduct a web scrape on a given webpage URL. Our web scraper is fully compatible with JavaScript and supports recursion depth, enabling you to efficiently extract all content from the target website.
🛠️ Usage
$result = $carbon->utilities->scrapeWeb( body: [ [ "url" => "url_example", "recursion_depth" => 3, "max_pages_to_scrape" => 100, "chunk_size" => 1500, "chunk_overlap" => 20, "skip_embedding_generation" => False, "enable_auto_sync" => False, "generate_sparse_vectors" => False, "prepend_filename_to_chunks" => False, "html_tags_to_skip" => [], "css_classes_to_skip" => [], "css_selectors_to_skip" => [], "embedding_model" => "OPENAI", "url_paths_to_include" => [], "download_css_and_media" => False, "generate_chunks_only" => False, "store_file_only" => False, "use_premium_proxies" => False, ] ], );
⚙️ Request Body
🔄 Return
object
🌐 Endpoint
/web_scrape POST
carbon.utilities.searchUrls
Perform a web search and obtain a list of relevant URLs.
As an illustration, when you perform a search for “content related to MRNA,” you will receive a list of links such as the following:
- https://tomrenz.substack.com/p/mrna-and-why-it-matters
- https://www.statnews.com/2020/11/10/the-story-of-mrna-how-a-once-dismissed-idea-became-a-leading-technology-in-the-covid-vaccine-race/
- https://www.statnews.com/2022/11/16/covid-19-vaccines-were-a-success-but-mrna-still-has-a-delivery-problem/
- https://joomi.substack.com/p/were-still-being-misled-about-how
Subsequently, you can submit these links to the web_scrape endpoint in order to retrieve the content of the respective web pages.
Args: query (str): Query to search for
Returns: FetchURLsResponse: A response object with a list of URLs for a given search query.
🛠️ Usage
$result = $carbon->utilities->searchUrls( query: "query_example" );
⚙️ Parameters
query: string
🔄 Return
🌐 Endpoint
/search_urls GET
carbon.utilities.userWebpages
User Web Pages
🛠️ Usage
$result = $carbon->utilities->userWebpages( filters: [ ], pagination: [ "limit" => 10, "offset" => 0, "starting_id" => 0, ], order_by: "created_at", order_dir: "asc" );
⚙️ Parameters
filters: UserWebPagesFilters
pagination: Pagination
order_by:
order_dir:
🔄 Return
object
🌐 Endpoint
/user_webpages POST
carbon.webhooks.addUrl
Add Webhook Url
🛠️ Usage
$result = $carbon->webhooks->addUrl( url: "string_example" );
⚙️ Parameters
url: string
🔄 Return
🌐 Endpoint
/add_webhook POST
carbon.webhooks.deleteUrl
Delete Webhook Url
🛠️ Usage
$result = $carbon->webhooks->deleteUrl( webhook_id: 1 );
⚙️ Parameters
webhook_id: int
🔄 Return
🌐 Endpoint
/delete_webhook/{webhook_id} DELETE
carbon.webhooks.urls
Webhook Urls
🛠️ Usage
$result = $carbon->webhooks->urls( pagination: [ "limit" => 10, "offset" => 0, "starting_id" => 0, ], order_by: "created_at", order_dir: "desc", filters: [ "ids" => [], ] );
⚙️ Parameters
pagination: Pagination
order_by:
order_dir:
filters: WebhookFilters
🔄 Return
🌐 Endpoint
/webhooks POST
carbon.whiteLabel.all
List White Labels
🛠️ Usage
$result = $carbon->whiteLabel->all( pagination: [ "limit" => 10, "offset" => 0, "starting_id" => 0, ], order_by: "created_at", order_dir: "desc", filters: [ "ids" => [], "data_source_type" => [], ] );
⚙️ Parameters
pagination: Pagination
order_by:
order_dir:
filters: WhiteLabelFilters
🔄 Return
object
🌐 Endpoint
/white_label/list POST
carbon.whiteLabel.create
Create White Labels
🛠️ Usage
$result = $carbon->whiteLabel->create( body: [ [ "data_source_type" => "GOOGLE_DRIVE", "credentials" => [ "client_id" => "client_id_example", "redirect_uri" => "redirect_uri_example", ], ] ], );
⚙️ Request Body
WhiteLabelCreateRequestInner[]
🔄 Return
object
🌐 Endpoint
/white_label/create POST
carbon.whiteLabel.delete
Delete White Labels
🛠️ Usage
$result = $carbon->whiteLabel->delete( ids: [ 1 ] );
⚙️ Parameters
ids: int[]
🔄 Return
object
🌐 Endpoint
/white_label/delete POST
carbon.whiteLabel.update
Update White Label
🛠️ Usage
$result = $carbon->whiteLabel->update( body: [ "data_source_type" => "GOOGLE_DRIVE", "credentials" => [ "client_id" => "client_id_example", "redirect_uri" => "redirect_uri_example", ], ], data_source_type: "INTERCOM", credentials: [ "client_id" => "client_id_example", "redirect_uri" => "redirect_uri_example", ] );
⚙️ Parameters
data_source_type: string
credentials: Credentials
🔄 Return
object
🌐 Endpoint
/white_label/update POST
Author
This PHP package is automatically generated by Konfig
统计信息
- 总下载量: 6
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 0
- 点击次数: 0
- 依赖项目数: 0
- 推荐数: 0
其他信息
- 授权协议: unlicense
- 更新时间: 2024-03-01
