This commit is contained in:
zxhlyh 2023-09-28 11:26:04 +08:00 committed by GitHub
parent 5e511e01bf
commit bcd744b6b7
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 324 additions and 141 deletions

View File

@ -71,7 +71,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
/> />
<Row> <Row>
<Col> <Col>
### Path Query ### Query
<Properties> <Properties>
<Property name='page' type='string' key='page'> <Property name='page' type='string' key='page'>
Page number Page number
@ -136,7 +136,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
<Col> <Col>
This api is based on an existing dataset and creates a new document through text based on this dataset. This api is based on an existing dataset and creates a new document through text based on this dataset.
### Path Params ### Params
<Properties> <Properties>
<Property name='dataset_id' type='string' key='dataset_id'> <Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID Dataset ID
@ -153,22 +153,22 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
</Property> </Property>
<Property name='indexing_technique' type='string' key='indexing_technique'> <Property name='indexing_technique' type='string' key='indexing_technique'>
Index mode Index mode
- high_quality High quality: embedding using embedding model, built as vector database index - <code>high_quality</code> High quality: embedding using embedding model, built as vector database index
- economy Economy: Build using inverted index of Keyword Table Index - <code>economy</code> Economy: Build using inverted index of Keyword Table Index
</Property> </Property>
<Property name='process_rule' type='object' key='process_rule'> <Property name='process_rule' type='object' key='process_rule'>
Processing rules Processing rules
- mode (string) Cleaning, segmentation mode, automatic / custom - <code>mode</code> (string) Cleaning, segmentation mode, automatic / custom
- rules (text) Custom rules (in automatic mode, this field is empty) - <code>rules</code> (object) Custom rules (in automatic mode, this field is empty)
- pre_processing_rules (array[object]) Preprocessing rules - <code>pre_processing_rules</code> (array[object]) Preprocessing rules
- id (string) Unique identifier for the preprocessing rule - <code>id</code> (string) Unique identifier for the preprocessing rule
- enumerate - enumerate
- remove_extra_spaces Replace consecutive spaces, newlines, tabs - <code>remove_extra_spaces</code> Replace consecutive spaces, newlines, tabs
- remove_urls_emails Delete URL, email address - <code>remove_urls_emails</code> Delete URL, email address
- enabled (bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value. - <code>enabled</code> (bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
- segmentation (object) segmentation rules - <code>segmentation</code> (object) segmentation rules
- separator Custom segment identifier, currently only allows one delimiter to be set. Default is \n - <code>separator</code> Custom segment identifier, currently only allows one delimiter to be set. Default is \n
- max_tokens Maximum length (token) defaults to 1000 - <code>max_tokens</code> Maximum length (token) defaults to 1000
</Property> </Property>
</Properties> </Properties>
</Col> </Col>
@ -238,7 +238,8 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
<Row> <Row>
<Col> <Col>
This api is based on an existing dataset and creates a new document through a file based on this dataset. This api is based on an existing dataset and creates a new document through a file based on this dataset.
### Path Params
### Params
<Properties> <Properties>
<Property name='dataset_id' type='string' key='dataset_id'> <Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID Dataset ID
@ -259,22 +260,22 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
</Property> </Property>
<Property name='indexing_technique' type='string' key='indexing_technique'> <Property name='indexing_technique' type='string' key='indexing_technique'>
Index mode Index mode
- high_quality High quality: embedding using embedding model, built as vector database index - <code>high_quality</code> High quality: embedding using embedding model, built as vector database index
- economy Economy: Build using inverted index of Keyword Table Index - <code>economy</code> Economy: Build using inverted index of Keyword Table Index
</Property> </Property>
<Property name='process_rule' type='object' key='process_rule'> <Property name='process_rule' type='object' key='process_rule'>
Processing rules Processing rules
- mode (string) Cleaning, segmentation mode, automatic / custom - <code>mode</code> (string) Cleaning, segmentation mode, automatic / custom
- rules (text) Custom rules (in automatic mode, this field is empty) - <code>rules</code> (object) Custom rules (in automatic mode, this field is empty)
- pre_processing_rules (array[object]) Preprocessing rules - <code>pre_processing_rules</code> (array[object]) Preprocessing rules
- id (string) Unique identifier for the preprocessing rule - <code>id</code> (string) Unique identifier for the preprocessing rule
- enumerate - enumerate
- remove_extra_spaces Replace consecutive spaces, newlines, tabs - <code>remove_extra_spaces</code> Replace consecutive spaces, newlines, tabs
- remove_urls_emails Delete URL, email address - <code>remove_urls_emails</code> Delete URL, email address
- enabled (bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value. - <code>enabled</code> (bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
- segmentation (object) segmentation rules - <code>segmentation</code> (object) segmentation rules
- separator Custom segment identifier, currently only allows one delimiter to be set. Default is \n - <code>separator</code> Custom segment identifier, currently only allows one delimiter to be set. Default is \n
- max_tokens Maximum length (token) defaults to 1000 - <code>max_tokens</code> Maximum length (token) defaults to 1000
</Property> </Property>
</Properties> </Properties>
</Col> </Col>
@ -338,7 +339,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
<Col> <Col>
This api is based on an existing dataset and updates the document through text based on this dataset. This api is based on an existing dataset and updates the document through text based on this dataset.
### Path Params ### Params
<Properties> <Properties>
<Property name='dataset_id' type='string' key='dataset_id'> <Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID Dataset ID
@ -358,17 +359,17 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
</Property> </Property>
<Property name='process_rule' type='object' key='process_rule'> <Property name='process_rule' type='object' key='process_rule'>
Processing rules Processing rules
- mode (string) Cleaning, segmentation mode, automatic / custom - <code>mode</code> (string) Cleaning, segmentation mode, automatic / custom
- rules (text) Custom rules (in automatic mode, this field is empty) - <code>rules</code> (object) Custom rules (in automatic mode, this field is empty)
- pre_processing_rules (array[object]) Preprocessing rules - <code>pre_processing_rules</code> (array[object]) Preprocessing rules
- id (string) Unique identifier for the preprocessing rule - <code>id</code> (string) Unique identifier for the preprocessing rule
- enumerate - enumerate
- remove_extra_spaces Replace consecutive spaces, newlines, tabs - <code>remove_extra_spaces</code> Replace consecutive spaces, newlines, tabs
- remove_urls_emails Delete URL, email address - <code>remove_urls_emails</code> Delete URL, email address
- enabled (bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value. - <code>enabled</code> (bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
- segmentation (object) segmentation rules - <code>segmentation</code> (object) segmentation rules
- separator Custom segment identifier, currently only allows one delimiter to be set. Default is \n - <code>separator</code> Custom segment identifier, currently only allows one delimiter to be set. Default is \n
- max_tokens Maximum length (token) defaults to 1000 - <code>max_tokens</code> Maximum length (token) defaults to 1000
</Property> </Property>
</Properties> </Properties>
</Col> </Col>
@ -435,7 +436,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
<Col> <Col>
This api is based on an existing dataset, and updates documents through files based on this dataset This api is based on an existing dataset, and updates documents through files based on this dataset
### Path Params ### Params
<Properties> <Properties>
<Property name='dataset_id' type='string' key='dataset_id'> <Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID Dataset ID
@ -455,17 +456,17 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
</Property> </Property>
<Property name='process_rule' type='object' key='process_rule'> <Property name='process_rule' type='object' key='process_rule'>
Processing rules Processing rules
- mode (string) Cleaning, segmentation mode, automatic / custom - <code>mode</code> (string) Cleaning, segmentation mode, automatic / custom
- rules (text) Custom rules (in automatic mode, this field is empty) - <code>rules</code> (object) Custom rules (in automatic mode, this field is empty)
- pre_processing_rules (array[object]) Preprocessing rules - <code>pre_processing_rules</code> (array[object]) Preprocessing rules
- id (string) Unique identifier for the preprocessing rule - <code>id</code> (string) Unique identifier for the preprocessing rule
- enumerate - enumerate
- remove_extra_spaces Replace consecutive spaces, newlines, tabs - <code>remove_extra_spaces</code> Replace consecutive spaces, newlines, tabs
- remove_urls_emails Delete URL, email address - <code>remove_urls_emails</code> Delete URL, email address
- enabled (bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value. - <code>enabled</code> (bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
- segmentation (object) segmentation rules - <code>segmentation</code> (object) segmentation rules
- separator Custom segment identifier, currently only allows one delimiter to be set. Default is \n - <code>separator</code> Custom segment identifier, currently only allows one delimiter to be set. Default is \n
- max_tokens Maximum length (token) defaults to 1000 - <code>max_tokens</code> Maximum length (token) defaults to 1000
</Property> </Property>
</Properties> </Properties>
</Col> </Col>
@ -527,7 +528,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
/> />
<Row> <Row>
<Col> <Col>
### Path Params ### Params
<Properties> <Properties>
<Property name='dataset_id' type='string' key='dataset_id'> <Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID Dataset ID
@ -582,7 +583,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
/> />
<Row> <Row>
<Col> <Col>
### Path Params ### Params
<Properties> <Properties>
<Property name='dataset_id' type='string' key='dataset_id'> <Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID Dataset ID
@ -624,14 +625,14 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
/> />
<Row> <Row>
<Col> <Col>
### Path Params ### Params
<Properties> <Properties>
<Property name='dataset_id' type='string' key='dataset_id'> <Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID Dataset ID
</Property> </Property>
</Properties> </Properties>
### Path Query ### Query
<Properties> <Properties>
<Property name='keyword' type='string' key='keyword'> <Property name='keyword' type='string' key='keyword'>
Search keywords, currently only search document names(optional) Search keywords, currently only search document names(optional)
@ -699,7 +700,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
/> />
<Row> <Row>
<Col> <Col>
### Path Params ### Params
<Properties> <Properties>
<Property name='dataset_id' type='string' key='dataset_id'> <Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID Dataset ID
@ -712,10 +713,9 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
### Request Body ### Request Body
<Properties> <Properties>
<Property name='segments' type='object list' key='segments'> <Property name='segments' type='object list' key='segments'>
segments (object list) Segmented content - <code>content</code> (text) Text content/question content, required
- content (text) Text content/question content, required - <code>answer</code> (text) Answer content, if the mode of the data set is qa mode, pass the value(optional)
- answer(text) Answer content, if the mode of the data set is qa mode, pass the value(optional) - <code>keywords</code> (list) Keywords(optional)
- keywords(list) Keywords(optional)
</Property> </Property>
</Properties> </Properties>
</Col> </Col>
@ -778,14 +778,106 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
--- ---
Error message <Row>
- **document_indexing**: Document indexing failed <Col>
- **provider_not_initialize**: Embedding model is not configured ### Error message
- **not_found**, Document does not exist <Properties>
- **dataset_name_duplicate**: Duplicate dataset name <Property name='code' type='string' key='code'>
- **provider_quota_exceeded**: Model quota exceeds limit Error code
- **dataset_not_initialized**: The dataset has not been initialized yet </Property>
- **unsupported_file_type**: Unsupported file types. </Properties>
- Currently only supports, txt, markdown, md, pdf, html, htm, xlsx, docx, csv <Properties>
- **too_many_files**: There are too many files. Currently, only a single file is uploaded <Property name='status' type='number' key='status'>
- **file_too_large*: The file is too large, support below 15M based on you environment configuration Error status
</Property>
</Properties>
<Properties>
<Property name='message' type='string' key='message'>
Error message
</Property>
</Properties>
</Col>
<Col>
<CodeGroup title="Example">
```json {{ title: 'Response' }}
{
"code": "no_file_uploaded",
"message": "Please upload your file.",
"status": 400
}
```
</CodeGroup>
</Col>
</Row>
<table className="max-w-auto border-collapse border border-slate-400" style={{ maxWidth: 'none', width: 'auto' }}>
<thead style={{ background: '#f9fafc' }}>
<tr>
<th class="p-2 border border-slate-300">code</th>
<th class="p-2 border border-slate-300">status</th>
<th class="p-2 border border-slate-300">message</th>
</tr>
</thead>
<tbody>
<tr>
<td class="p-2 border border-slate-300">no_file_uploaded</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">Please upload your file.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">too_many_files</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">Only one file is allowed.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">file_too_large</td>
<td class="p-2 border border-slate-300">413</td>
<td class="p-2 border border-slate-300">File size exceeded.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">unsupported_file_type</td>
<td class="p-2 border border-slate-300">415</td>
<td class="p-2 border border-slate-300">File type not allowed.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">high_quality_dataset_only</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">Current operation only supports 'high-quality' datasets.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">dataset_not_initialized</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">The dataset is still being initialized or indexing. Please wait a moment.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">archived_document_immutable</td>
<td class="p-2 border border-slate-300">403</td>
<td class="p-2 border border-slate-300">The archived document is not editable.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">dataset_name_duplicate</td>
<td class="p-2 border border-slate-300">409</td>
<td class="p-2 border border-slate-300">The dataset name already exists. Please modify your dataset name.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">invalid_action</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">Invalid action.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">document_already_finished</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">The document has been processed. Please refresh the page or go to the document details.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">document_indexing</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">The document is being processed and cannot be edited.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">invalid_metadata</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">The metadata content is incorrect. Please check and verify.</td>
</tr>
</tbody>
</table>
<div class="pb-4" />

View File

@ -71,7 +71,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
/> />
<Row> <Row>
<Col> <Col>
### Path Query ### Query
<Properties> <Properties>
<Property name='page' type='string' key='page'> <Property name='page' type='string' key='page'>
页码 页码
@ -136,7 +136,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
<Col> <Col>
此接口基于已存在数据集,在此数据集的基础上通过文本创建新的文档 此接口基于已存在数据集,在此数据集的基础上通过文本创建新的文档
### Path Params ### Path
<Properties> <Properties>
<Property name='dataset_id' type='string' key='dataset_id'> <Property name='dataset_id' type='string' key='dataset_id'>
数据集 ID 数据集 ID
@ -153,22 +153,22 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
</Property> </Property>
<Property name='indexing_technique' type='string' key='indexing_technique'> <Property name='indexing_technique' type='string' key='indexing_technique'>
索引方式 索引方式
- high_quality 高质量:使用 embedding 模型进行嵌入,构建为向量数据库索引 - <code>high_quality</code> 高质量:使用 embedding 模型进行嵌入,构建为向量数据库索引
- economy 经济:使用 Keyword Table Index 的倒排索引进行构建 - <code>economy</code> 经济:使用 Keyword Table Index 的倒排索引进行构建
</Property> </Property>
<Property name='process_rule' type='object' key='process_rule'> <Property name='process_rule' type='object' key='process_rule'>
处理规则 处理规则
- mode (string) 清洗、分段模式 automatic 自动 / custom 自定义 - <code>mode</code> (string) 清洗、分段模式 automatic 自动 / custom 自定义
- rules (text) 自定义规则(自动模式下,该字段为空) - <code>rules</code> (object) 自定义规则(自动模式下,该字段为空)
- pre_processing_rules (array[object]) 预处理规则 - <code>pre_processing_rules</code> (array[object]) 预处理规则
- id (string) 预处理规则的唯一标识符 - <code>id</code> (string) 预处理规则的唯一标识符
- 枚举: - 枚举:
- remove_extra_spaces 替换连续空格、换行符、制表符 - <code>remove_extra_spaces</code> 替换连续空格、换行符、制表符
- remove_urls_emails 删除 URL、电子邮件地址 - <code>remove_urls_emails</code> 删除 URL、电子邮件地址
- enabled (bool) 是否选中该规则,不传入文档 ID 时代表默认值 - <code>enabled</code> (bool) 是否选中该规则,不传入文档 ID 时代表默认值
- segmentation (object) 分段规则 - <code>segmentation</code> (object) 分段规则
- separator 自定义分段标识符,目前仅允许设置一个分隔符。默认为 \n - <code>separator</code> 自定义分段标识符,目前仅允许设置一个分隔符。默认为 \n
- max_tokens 最大长度 (token) 默认为 1000 - <code>max_tokens</code> 最大长度 (token) 默认为 1000
</Property> </Property>
</Properties> </Properties>
</Col> </Col>
@ -239,7 +239,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
<Col> <Col>
此接口基于已存在数据集,在此数据集的基础上通过文件创建新的文档 此接口基于已存在数据集,在此数据集的基础上通过文件创建新的文档
### Path Params ### Path
<Properties> <Properties>
<Property name='dataset_id' type='string' key='dataset_id'> <Property name='dataset_id' type='string' key='dataset_id'>
数据集 ID 数据集 ID
@ -252,30 +252,30 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
源文档 ID (选填) 源文档 ID (选填)
- 用于重新上传文档或修改文档清洗、分段配置,缺失的信息从源文档复制 - 用于重新上传文档或修改文档清洗、分段配置,缺失的信息从源文档复制
- 源文档不可为归档的文档 - 源文档不可为归档的文档
- 当传入 original_document_id 时代表文档进行更新操作process_rule 为可填项目,不填默认使用源文档的分段方式 - 当传入 <code>original_document_id</code> 时,代表文档进行更新操作,<code>process_rule</code> 为可填项目,不填默认使用源文档的分段方式
- 未传入 original_document_id 时代表文档进行新增操作process_rule 为必填 - 未传入 <code>original_document_id</code> 时,代表文档进行新增操作,<code>process_rule</code> 为必填
</Property> </Property>
<Property name='file' type='multipart/form-data' key='file'> <Property name='file' type='multipart/form-data' key='file'>
需要上传的文件。 需要上传的文件。
</Property> </Property>
<Property name='indexing_technique' type='string' key='indexing_technique'> <Property name='indexing_technique' type='string' key='indexing_technique'>
索引方式 索引方式
- high_quality 高质量:使用 embedding 模型进行嵌入,构建为向量数据库索引 - <code>high_quality</code> 高质量:使用 embedding 模型进行嵌入,构建为向量数据库索引
- economy 经济:使用 Keyword Table Index 的倒排索引进行构建 - <code>economy</code> 经济:使用 Keyword Table Index 的倒排索引进行构建
</Property> </Property>
<Property name='process_rule' type='object' key='process_rule'> <Property name='process_rule' type='object' key='process_rule'>
处理规则 处理规则
- mode (string) 清洗、分段模式 automatic 自动 / custom 自定义 - <code>mode</code> (string) 清洗、分段模式 automatic 自动 / custom 自定义
- rules (text) 自定义规则(自动模式下,该字段为空) - <code>rules</code> (object) 自定义规则(自动模式下,该字段为空)
- pre_processing_rules (array[object]) 预处理规则 - <code>pre_processing_rules</code> (array[object]) 预处理规则
- id (string) 预处理规则的唯一标识符 - <code>id</code> (string) 预处理规则的唯一标识符
- 枚举: - 枚举:
- remove_extra_spaces 替换连续空格、换行符、制表符 - <code>remove_extra_spaces</code> 替换连续空格、换行符、制表符
- remove_urls_emails 删除 URL、电子邮件地址 - <code>remove_urls_emails</code> 删除 URL、电子邮件地址
- enabled (bool) 是否选中该规则,不传入文档 ID 时代表默认值 - <code>enabled</code> (bool) 是否选中该规则,不传入文档 ID 时代表默认值
- segmentation (object) 分段规则 - <code>segmentation</code> (object) 分段规则
- separator 自定义分段标识符,目前仅允许设置一个分隔符默认为 \n - <code>separator</code> 自定义分段标识符,目前仅允许设置一个分隔符默认为 \n
- max_tokens 最大长度 (token) 默认为 1000 - <code>max_tokens</code> 最大长度 (token) 默认为 1000
</Property> </Property>
</Properties> </Properties>
</Col> </Col>
@ -339,7 +339,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
<Col> <Col>
此接口基于已存在数据集,在此数据集的基础上通过文本更新文档 此接口基于已存在数据集,在此数据集的基础上通过文本更新文档
### Path Params ### Path
<Properties> <Properties>
<Property name='dataset_id' type='string' key='dataset_id'> <Property name='dataset_id' type='string' key='dataset_id'>
数据集 ID 数据集 ID
@ -359,17 +359,17 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
</Property> </Property>
<Property name='process_rule' type='object' key='process_rule'> <Property name='process_rule' type='object' key='process_rule'>
处理规则(选填) 处理规则(选填)
- mode (string) 清洗、分段模式 automatic 自动 / custom 自定义 - <code>mode</code> (string) 清洗、分段模式 automatic 自动 / custom 自定义
- rules (text) 自定义规则(自动模式下,该字段为空) - <code>rules</code> (object) 自定义规则(自动模式下,该字段为空)
- pre_processing_rules (array[object]) 预处理规则 - <code>pre_processing_rules</code> (array[object]) 预处理规则
- id (string) 预处理规则的唯一标识符 - <code>id</code> (string) 预处理规则的唯一标识符
- 枚举: - 枚举:
- remove_extra_spaces 替换连续空格、换行符、制表符 - <code>remove_extra_spaces</code> 替换连续空格、换行符、制表符
- remove_urls_emails 删除 URL、电子邮件地址 - <code>remove_urls_emails</code> 删除 URL、电子邮件地址
- enabled (bool) 是否选中该规则,不传入文档 ID 时代表默认值 - <code>enabled</code> (bool) 是否选中该规则,不传入文档 ID 时代表默认值
- segmentation (object) 分段规则 - <code>segmentation</code> (object) 分段规则
- separator 自定义分段标识符,目前仅允许设置一个分隔符。默认为 \n - <code>separator</code> 自定义分段标识符,目前仅允许设置一个分隔符。默认为 \n
- max_tokens 最大长度 (token) 默认为 1000 - <code>max_tokens</code> 最大长度 (token) 默认为 1000
</Property> </Property>
</Properties> </Properties>
</Col> </Col>
@ -436,7 +436,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
<Col> <Col>
此接口基于已存在数据集,在此数据集的基础上通过文件更新文档的操作。 此接口基于已存在数据集,在此数据集的基础上通过文件更新文档的操作。
### Path Params ### Path
<Properties> <Properties>
<Property name='dataset_id' type='string' key='dataset_id'> <Property name='dataset_id' type='string' key='dataset_id'>
数据集 ID 数据集 ID
@ -456,17 +456,17 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
</Property> </Property>
<Property name='process_rule' type='object' key='process_rule'> <Property name='process_rule' type='object' key='process_rule'>
处理规则(选填) 处理规则(选填)
- mode (string) 清洗、分段模式 automatic 自动 / custom 自定义 - <code>mode</code> (string) 清洗、分段模式 automatic 自动 / custom 自定义
- rules (text) 自定义规则(自动模式下,该字段为空) - <code>rules</code> (object) 自定义规则(自动模式下,该字段为空)
- pre_processing_rules (array[object]) 预处理规则 - <code>pre_processing_rules</code> (array[object]) 预处理规则
- id (string) 预处理规则的唯一标识符 - <code>id</code> (string) 预处理规则的唯一标识符
- 枚举: - 枚举:
- remove_extra_spaces 替换连续空格、换行符、制表符 - <code>remove_extra_spaces</code> 替换连续空格、换行符、制表符
- remove_urls_emails 删除 URL、电子邮件地址 - <code>remove_urls_emails</code> 删除 URL、电子邮件地址
- enabled (bool) 是否选中该规则,不传入文档 ID 时代表默认值 - <code>enabled</code> (bool) 是否选中该规则,不传入文档 ID 时代表默认值
- segmentation (object) 分段规则 - <code>segmentation</code> (object) 分段规则
- separator 自定义分段标识符,目前仅允许设置一个分隔符默认为 \n - <code>separator</code> 自定义分段标识符,目前仅允许设置一个分隔符默认为 \n
- max_tokens 最大长度 (token) 默认为 1000 - <code>max_tokens</code> 最大长度 (token) 默认为 1000
</Property> </Property>
</Properties> </Properties>
</Col> </Col>
@ -528,7 +528,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
/> />
<Row> <Row>
<Col> <Col>
### Path Params ### Path
<Properties> <Properties>
<Property name='dataset_id' type='string' key='dataset_id'> <Property name='dataset_id' type='string' key='dataset_id'>
数据集 ID 数据集 ID
@ -583,7 +583,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
/> />
<Row> <Row>
<Col> <Col>
### Path Params ### Path
<Properties> <Properties>
<Property name='dataset_id' type='string' key='dataset_id'> <Property name='dataset_id' type='string' key='dataset_id'>
数据集 ID 数据集 ID
@ -625,14 +625,14 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
/> />
<Row> <Row>
<Col> <Col>
### Path Params ### Path
<Properties> <Properties>
<Property name='dataset_id' type='string' key='dataset_id'> <Property name='dataset_id' type='string' key='dataset_id'>
数据集 ID 数据集 ID
</Property> </Property>
</Properties> </Properties>
### Path Query ### Query
<Properties> <Properties>
<Property name='keyword' type='string' key='keyword'> <Property name='keyword' type='string' key='keyword'>
搜索关键词,可选,目前仅搜索文档名称 搜索关键词,可选,目前仅搜索文档名称
@ -700,7 +700,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
/> />
<Row> <Row>
<Col> <Col>
### Path Params ### Path
<Properties> <Properties>
<Property name='dataset_id' type='string' key='dataset_id'> <Property name='dataset_id' type='string' key='dataset_id'>
数据集 ID 数据集 ID
@ -713,10 +713,9 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
### Request Body ### Request Body
<Properties> <Properties>
<Property name='segments' type='object list' key='segments'> <Property name='segments' type='object list' key='segments'>
segments (object list) 分段内容 - <code>content</code> (text) 文本内容/问题内容,必填
- content (text) 文本内容/问题内容,必填 - <code>answer</code> (text) 答案内容非必填如果数据集的模式为qa模式则传值
- answer(text) 答案内容非必填如果数据集的模式为qa模式则传值 - <code>keywords</code> (list) 关键字,非必填
- keywords(list) 关键字,非必填
</Property> </Property>
</Properties> </Properties>
</Col> </Col>
@ -779,14 +778,106 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
--- ---
错误信息 <Row>
- **document_indexing**: 文档索引失败 <Col>
- **provider_not_initialize**: Embedding 模型未配置 ### 错误信息
- **not_found**,文档不存在 <Properties>
- **dataset_name_duplicate**: 数据集名称重复 <Property name='code' type='string' key='code'>
- **provider_quota_exceeded**: 模型额度超过限制 返回的错误代码
- **dataset_not_initialized**: 数据集还未初始化 </Property>
- **unsupported_file_type**: 不支持的文件类型 </Properties>
- 目前只支持txt, markdown, md, pdf, html, htm, xlsx, docx, csv <Properties>
- **too_many_files**: 文件数量过多,暂时只支持单一文件上传 <Property name='status' type='number' key='status'>
- **file_too_large*: 文件太大默认支持15M以下, 具体需要参考环境变量配置 返回的错误状态
</Property>
</Properties>
<Properties>
<Property name='message' type='string' key='message'>
返回的错误信息
</Property>
</Properties>
</Col>
<Col>
<CodeGroup title="Example">
```json {{ title: 'Response' }}
{
"code": "no_file_uploaded",
"message": "Please upload your file.",
"status": 400
}
```
</CodeGroup>
</Col>
</Row>
<table className="max-w-auto border-collapse border border-slate-400" style={{ maxWidth: 'none', width: 'auto' }}>
<thead style={{ background: '#f9fafc' }}>
<tr>
<th class="p-2 border border-slate-300">code</th>
<th class="p-2 border border-slate-300">status</th>
<th class="p-2 border border-slate-300">message</th>
</tr>
</thead>
<tbody>
<tr>
<td class="p-2 border border-slate-300">no_file_uploaded</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">Please upload your file.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">too_many_files</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">Only one file is allowed.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">file_too_large</td>
<td class="p-2 border border-slate-300">413</td>
<td class="p-2 border border-slate-300">File size exceeded.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">unsupported_file_type</td>
<td class="p-2 border border-slate-300">415</td>
<td class="p-2 border border-slate-300">File type not allowed.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">high_quality_dataset_only</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">Current operation only supports 'high-quality' datasets.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">dataset_not_initialized</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">The dataset is still being initialized or indexing. Please wait a moment.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">archived_document_immutable</td>
<td class="p-2 border border-slate-300">403</td>
<td class="p-2 border border-slate-300">The archived document is not editable.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">dataset_name_duplicate</td>
<td class="p-2 border border-slate-300">409</td>
<td class="p-2 border border-slate-300">The dataset name already exists. Please modify your dataset name.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">invalid_action</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">Invalid action.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">document_already_finished</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">The document has been processed. Please refresh the page or go to the document details.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">document_indexing</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">The document is being processed and cannot be edited.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">invalid_metadata</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">The metadata content is incorrect. Please check and verify.</td>
</tr>
</tbody>
</table>
<div class="pb-4" />

View File

@ -20,7 +20,7 @@ const LocaleLayout = ({
return ( return (
<html lang={locale ?? 'en'} className="h-full"> <html lang={locale ?? 'en'} className="h-full">
<body <body
className="h-full" className="h-full select-auto"
data-api-prefix={process.env.NEXT_PUBLIC_API_PREFIX} data-api-prefix={process.env.NEXT_PUBLIC_API_PREFIX}
data-pubic-api-prefix={process.env.NEXT_PUBLIC_PUBLIC_API_PREFIX} data-pubic-api-prefix={process.env.NEXT_PUBLIC_PUBLIC_API_PREFIX}
data-public-edition={process.env.NEXT_PUBLIC_EDITION} data-public-edition={process.env.NEXT_PUBLIC_EDITION}