Apache Opendal S3

The UnstructuredApacheOpendalS3FileLoader class is a document loader that loads documents from an S3 bucket using the Apache Opendal library. The loader is designed to work with the unstructured library and is compatible with the unstructured document processing pipeline.

Overview

Integration details

Class	Package	Local	Serializable	JS support
UnstructuredApacheOpendalS3FileLoader	langchain_community	✅	❌	❌

Loader features

Source	Document Lazy Loading	Native Async Support
UnstructuredApacheOpendalS3FileLoader	✅	❌

Setup

Credentials

No credentials are required to use the UnstructuredApacheOpendalS3FileLoader.

Installation

%pip install --upgrade --quiet  opendal  unstructured

Instantiation

Now we can instantiate our document loader object and load Documents:

from langchain_community.document_loaders.apache_opendal_s3 import (
    UnstructuredApacheOpendalS3FileLoader,
)

API Reference:UnstructuredApacheOpendalS3FileLoader

key = "data2.csv"
bucket = "liugddx"
region_name = "ap-northeast-1"

loader = UnstructuredApacheOpendalS3FileLoader(
    key,
    bucket,
    region_name,
    aws_access_key_id="xxx",
    aws_secret_access_key="xxx",
)

Load

Use .load() to load the documents from the S3 bucket. The loader will return a list of documents.

docs = loader.load()

/Users/liugddx/code/langchain/.venv/lib/python3.10/site-packages/unstructured/partition/csv.py:84: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support sep=None with delim_whitespace=False; you can avoid this warning by specifying engine='python'.
  dataframe = pd.read_csv(file, header=ctx.header, sep=ctx.delimiter)

Returns each table row as dict.

len(docs)

docs[0].page_content

'\n\n\n\nD\n\n\nA10101010\nNone\n\n\n'

Lazy Load

The UnstructuredApacheOpendalS3FileLoader supports lazy loading. This means that the documents are not loaded into memory until they are accessed. This can be useful when working with large documents.

for doc in loader.lazy_load():
    print(doc.page_content)

D


A10101010
None
``````output
/Users/liugddx/code/langchain/.venv/lib/python3.10/site-packages/unstructured/partition/csv.py:84: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support sep=None with delim_whitespace=False; you can avoid this warning by specifying engine='python'.
  dataframe = pd.read_csv(file, header=ctx.header, sep=ctx.delimiter)

API reference

For further information, please refer to the API reference.

Document loader conceptual guide
Document loader how-to guides

Apache Opendal S3

Overview

Integration details

Loader features

Setup

Credentials

Installation

Instantiation

Load

Lazy Load

API reference

Was this page helpful?

You can also leave detailed feedback on GitHub.

Apache Opendal S3

Overview​

Integration details​

Loader features​

Setup​

Credentials​

Installation​

Instantiation​

Load​

Lazy Load​

API reference​

Related​

Was this page helpful?

You can also leave detailed feedback on GitHub.

Overview

Integration details

Loader features

Setup

Credentials

Installation

Instantiation

Load

Lazy Load

API reference

Related