How to Build an FDA Form Autofill Agent with Arcade's Google Docs + Drive Toolkits

How to Build an FDA Form Autofill Agent with Arcade's Google Docs + Drive Toolkits

Arcade.dev Team's avatar
Arcade.dev Team
OCTOBER 21, 2025
10 MIN READ
THOUGHT LEADERSHIP
Rays decoration image
Ghost Icon

Medical device manufacturers waste hours copying data between documents when preparing 510(k) submissions. Each submission requires dozens of forms pulling information from test reports, design specs, and prior submissions. This guide shows how to build an AI agent that automates FDA form completion using Arcade's Google Docs and Google Drive toolkits.

Prerequisites

  • Active Arcade account with API key (get your API key)
  • Python 3.8 or higher
  • Google Cloud Console project with OAuth 2.0 credentials
  • Google Workspace access with Drive and Docs permissions
  • Basic async/await knowledge in Python

Agent Architecture

The autofill agent executes this workflow:

  1. User requests form completion through chat interface
  2. Agent searches Google Drive for source documents
  3. Agent reads document content using Google Docs toolkit
  4. LLM extracts required information from sources
  5. Agent creates and populates target form
  6. User reviews and approves autofilled content

Arcade handles OAuth authentication, token management, and API access throughout this process.

Authentication Setup

Note: The Google Docs toolkit requires a self-hosted Arcade instance. It is not available in Arcade Cloud.

Configure Google OAuth in Arcade

Add the Google OAuth provider to your Arcade instance:

  1. Access your Arcade Dashboard (default: http://localhost:9099/dashboard for self-hosted)
  2. Navigate to OAuth → Providers
  3. Click "Add OAuth Provider"
  4. Select "Included Providers" tab
  5. Choose "Google" from the dropdown
  6. Enter your Client ID and Client Secret
  7. Copy the generated Redirect URL
  8. Add this Redirect URL to your Google Cloud Console app's Authorized redirect URIs

Full configuration details: Google auth provider documentation

Install Required Packages

pip install arcadepy

For building custom tools:

pip install arcade-ai

Set Environment Variables

export ARCADE_API_KEY="your_arcade_api_key"

Build Document Search Component

Initialize Arcade Client

import os
from arcadepy import Arcade
from typing import List, Dict, Any

class FDAFormAutofiller:
    def __init__(self):
        self.client = Arcade(api_key=os.getenv("ARCADE_API_KEY"))
        self.user_sessions = {}

    async def authenticate_user(self, user_id: str) -> Dict[str, Any]:
        """Authenticate user for Google Drive and Docs access"""

        # Define required OAuth scopes
        scopes = [
            "https://www.googleapis.com/auth/drive",
            "https://www.googleapis.com/auth/documents",
        ]

        auth_response = await self.client.auth.start(
            user_id=user_id,
            provider="google",
            scopes=scopes
        )

        if auth_response.status != "completed":
            return {
                "authorization_required": True,
                "url": auth_response.url,
                "message": "Complete authorization to access Google Drive and Docs"
            }

        await self.client.auth.wait_for_completion(auth_response)

        self.user_sessions[user_id] = {"authenticated": True}

        return {"authenticated": True}

Search Drive for Source Documents

Use the GoogleDrive toolkit to locate relevant documents:

async def search_source_documents(
    self,
    user_id: str,
    query_terms: str,
    max_results: int = 20
) -> List[Dict[str, Any]]:
    """Search Google Drive for FDA source documents"""

    # Verify authentication
    if user_id not in self.user_sessions:
        await self.authenticate_user(user_id)

    # Execute search using GoogleDrive.SearchFiles
    result = await self.client.tools.execute(
        tool_name="GoogleDrive.SearchFiles",
        input={
            "query": query_terms,
            "limit": max_results
        },
        user_id=user_id
    )

    return result.output.get("files", [])

Available GoogleDrive toolkit tools: GoogleDrive reference

Extract Information from Documents

Read Document Content

Use the GoogleDocs toolkit to retrieve document content. The toolkit provides these tools:

  • GoogleDocs.GetDocument - Get the latest version of a Google Doc
  • GoogleDocs.AppendText - Insert text at end of document
  • GoogleDocs.CreateBlankDocument - Create blank document with title
  • GoogleDocs.CreateDocument - Create document with title and content
async def get_document_content(
    self,
    user_id: str,
    document_id: str
) -> str:
    """Retrieve content from Google Doc in markdown format"""

    result = await self.client.tools.execute(
        tool_name="GoogleDocs.GetDocument",
        input={
            "document_id": document_id,
            "format": "MARKDOWN"  # Options: MARKDOWN, HTML, GOOGLE_API_JSON
        },
        user_id=user_id
    )

    return result.output.get("content", "")

Full tool documentation: Google Docs toolkit

Structure Information Extraction

Define a schema for FDA form data:

from pydantic import BaseModel, Field
from typing import List

class DeviceInformation(BaseModel):
    """Structured device information for FDA 510(k) forms"""
    device_name: str = Field(description="Device trade or common name")
    manufacturer: str = Field(description="Legal manufacturer name")
    classification_name: str = Field(description="Device classification per 21 CFR")
    product_code: str = Field(description="Three-letter FDA product code")
    intended_use: str = Field(description="Intended use statement")
    indications_for_use: str = Field(description="Specific clinical indications")
    predicate_device: str = Field(description="510(k) number of predicate (format: KYYXXXX)")
    technological_characteristics: List[str] = Field(
        description="Key technological features"
    )

async def extract_device_information(
    self,
    user_id: str,
    source_document_ids: List[str]
) -> DeviceInformation:
    """Extract structured device data from source documents"""

    # Retrieve content from all source documents
    documents = []
    for doc_id in source_document_ids:
        content = await self.get_document_content(user_id, doc_id)
        documents.append(content)

    combined_content = "\n\n---\n\n".join(documents)

    # Use LLM with structured output to extract information
    # Implementation depends on your LLM provider
    # Ensure the LLM returns data matching the DeviceInformation schema

    extraction_prompt = f"""
    Extract device information for FDA 510(k) submission from these documents:

    {combined_content}

    Return structured data following the DeviceInformation schema.
    """

    # Your LLM extraction logic here
    # extracted_data = await your_llm.extract(prompt, schema=DeviceInformation)

    return extracted_data

Create and Populate Forms

Generate Form Template

Create FDA form templates in Google Docs:

async def create_form_document(
    self,
    user_id: str,
    form_type: str,
    title: str
) -> str:
    """Create new FDA form document with template"""

    # Get form template
    template = self.get_form_template(form_type)

    # Create document with content using GoogleDocs.CreateDocument
    result = await self.client.tools.execute(
        tool_name="GoogleDocs.CreateDocument",
        input={
            "title": title,
            "content": template
        },
        user_id=user_id
    )

    return result.output.get("document_id")

def get_form_template(self, form_type: str) -> str:
    """Return FDA form template structure"""

    templates = {
        "indications_for_use": """
INDICATIONS FOR USE STATEMENT
FDA Form 3881

Device Name: [DEVICE_NAME]

Manufacturer: [MANUFACTURER]

510(k) Number: [510K_NUMBER]

INDICATIONS FOR USE:
[INDICATIONS]

PRESCRIPTION USE: [ ] Yes  [ ] No
(Per 21 CFR 801.109)

OVER-THE-COUNTER USE: [ ] Yes  [ ] No
(21 CFR 801.109)
        """,

        "device_description": """
DEVICE DESCRIPTION

Device Name: [DEVICE_NAME]

Product Code: [PRODUCT_CODE]

Classification Name: [CLASSIFICATION]

PHYSICAL DESCRIPTION:
[PHYSICAL_DESC]

TECHNOLOGICAL CHARACTERISTICS:
[TECH_CHARACTERISTICS]

MATERIALS:
[MATERIALS]
        """,

        "510k_summary": """
510(k) SUMMARY

Submitter Information:
Name: [MANUFACTURER]
Address: [ADDRESS]

Device Information:
Trade Name: [DEVICE_NAME]
Common Name: [COMMON_NAME]
Classification Name: [CLASSIFICATION]
Product Code: [PRODUCT_CODE]

Predicate Device:
510(k) Number: [PREDICATE_510K]
Trade Name: [PREDICATE_NAME]

Substantial Equivalence:
[SE_COMPARISON]
        """
    }

    return templates.get(form_type, "")

Populate Form with Extracted Data

Replace template placeholders with extracted information:

async def autofill_form(
    self,
    user_id: str,
    document_id: str,
    device_info: DeviceInformation
) -> Dict[str, Any]:
    """Autofill FDA form with extracted device information"""

    # Get current document content
    current_content = await self.get_document_content(user_id, document_id)

    # Create replacement mapping
    replacements = {
        "[DEVICE_NAME]": device_info.device_name,
        "[MANUFACTURER]": device_info.manufacturer,
        "[PRODUCT_CODE]": device_info.product_code,
        "[CLASSIFICATION]": device_info.classification_name,
        "[INDICATIONS]": device_info.indications_for_use,
        "[510K_NUMBER]": device_info.predicate_device,
        "[TECH_CHARACTERISTICS]": "\n".join(
            f"• {char}" for char in device_info.technological_characteristics
        )
    }

    # Replace placeholders
    updated_content = current_content
    for placeholder, value in replacements.items():
        updated_content = updated_content.replace(placeholder, value)

    # Create new populated document
    result = await self.client.tools.execute(
        tool_name="GoogleDocs.CreateDocument",
        input={
            "title": f"Completed - {document_id}",
            "content": updated_content
        },
        user_id=user_id
    )

    return {
        "success": True,
        "document_id": result.output.get("document_id"),
        "document_url": result.output.get("url")
    }

Build Complete Agent Workflow

Orchestrate Full Process

Combine components into complete autofill workflow:

async def process_form_autofill_request(
    self,
    user_id: str,
    form_type: str,
    search_query: str
) -> Dict[str, Any]:
    """Execute complete FDA form autofill workflow"""

    try:
        # Step 1: Authenticate user
        auth_result = await self.authenticate_user(user_id)
        if auth_result.get("authorization_required"):
            return auth_result

        # Step 2: Search for source documents
        source_docs = await self.search_source_documents(
            user_id,
            search_query,
            max_results=20
        )

        if not source_docs:
            return {
                "success": False,
                "message": "No source documents found. Refine your search query."
            }

        # Step 3: Extract information from top 5 documents
        doc_ids = [doc["id"] for doc in source_docs[:5]]
        device_info = await self.extract_device_information(user_id, doc_ids)

        # Step 4: Validate extracted information
        validation = self.validate_device_information(device_info)
        if not validation["valid"]:
            return {
                "success": False,
                "errors": validation["errors"],
                "message": "Extracted information failed validation"
            }

        # Step 5: Create form template
        form_title = f"FDA {form_type} - {device_info.device_name}"
        form_doc_id = await self.create_form_document(
            user_id,
            form_type,
            form_title
        )

        # Step 6: Autofill form
        result = await self.autofill_form(user_id, form_doc_id, device_info)

        return {
            "success": True,
            "form_document_id": result["document_id"],
            "form_document_url": result["document_url"],
            "source_documents": [doc["name"] for doc in source_docs[:5]],
            "device_name": device_info.device_name,
            "validation_warnings": validation.get("warnings", [])
        }

    except Exception as e:
        return {
            "success": False,
            "error": str(e)
        }

Validate Extracted Information

Implement validation against FDA requirements:

def validate_device_information(self, device_info: DeviceInformation) -> Dict[str, Any]:
    """Validate extracted device data meets FDA requirements"""

    errors = []
    warnings = []

    # Validate device name
    if not device_info.device_name or len(device_info.device_name) < 3:
        errors.append("Device name required (minimum 3 characters)")

    # Validate product code format
    if not device_info.product_code or len(device_info.product_code) != 3:
        errors.append("Product code must be exactly 3 characters")

    # Validate 510(k) predicate format
    if not device_info.predicate_device.startswith("K"):
        errors.append("Predicate 510(k) number must start with 'K'")

    # Check intended use detail
    if len(device_info.intended_use) < 50:
        warnings.append("Intended use statement should be more detailed")

    # Validate technological characteristics
    if len(device_info.technological_characteristics) < 3:
        warnings.append("Consider adding more technological characteristics")

    return {
        "valid": len(errors) == 0,
        "errors": errors,
        "warnings": warnings
    }

Integrate with Agent Frameworks

LangChain Integration

Use Arcade tools with LangChain:

from arcadepy import Arcade

async def setup_arcade_tools_for_langchain(user_id: str):
    """Get Arcade tools formatted for LangChain"""

    client = Arcade()

    # Get Google Docs and Drive tools
    docs_tools = await client.tools.list(toolkit="google_docs", user_id=user_id)
    drive_tools = await client.tools.list(toolkit="google_drive", user_id=user_id)

    # Authorize all tools
    all_tools = docs_tools.items + drive_tools.items
    for tool in all_tools:
        auth_result = await client.tools.authorize(
            tool_name=tool.name,
            user_id=user_id
        )
        if auth_result.status != "completed":
            print(f"Authorize: {auth_result.url}")
            await client.auth.wait_for_completion(auth_result)

    return all_tools

Full LangChain integration guide: Using Arcade tools with LangChain

Google ADK Integration

Use Arcade with Google ADK:

from google.adk import Agent
from google_adk_arcade.tools import get_arcade_tools
from arcadepy import AsyncArcade

async def create_fda_agent(user_id: str):
    """Create FDA form agent with Google ADK"""

    client = AsyncArcade()

    # Get Google toolkit tools
    tools = await get_arcade_tools(
        client,
        toolkits=["google_docs", "google_drive"]
    )

    # Authorize tools
    for tool in tools:
        result = await client.tools.authorize(
            tool_name=tool.name,
            user_id=user_id
        )
        if result.status != "completed":
            await client.auth.wait_for_completion(result)

    # Create agent
    agent = Agent(
        model="gemini-2.0-flash",
        name="fda_form_agent",
        instruction="""
        You are an FDA regulatory assistant for 510(k) submissions.
        Search Google Drive for relevant documents, extract device information,
        and populate FDA form templates. Verify all information with the user
        before finalizing forms.
        """,
        tools=tools
    )

    return agent

Integration documentation: Google ADK with Arcade

Handle Errors and Edge Cases

Implement Retry Logic

import asyncio

async def safe_execute_tool(
    self,
    tool_name: str,
    input_params: Dict[str, Any],
    user_id: str,
    max_retries: int = 3
) -> Any:
    """Execute tool with retry logic"""

    for attempt in range(max_retries):
        try:
            result = await self.client.tools.execute(
                tool_name=tool_name,
                input=input_params,
                user_id=user_id
            )
            return result

        except Exception as e:
            error_type = getattr(e, 'type', 'unknown')

            # Handle authorization errors
            if error_type == "authorization_required":
                await self.authenticate_user(user_id)
                continue

            # Handle rate limits with exponential backoff
            if error_type == "rate_limit_exceeded":
                await asyncio.sleep(2 ** attempt)
                continue

            # Final attempt failed
            if attempt == max_retries - 1:
                raise Exception(f"Failed after {max_retries} attempts: {str(e)}")

Handle Authorization Callbacks

Manage user authorization flows:

from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse

app = FastAPI()

@app.post("/api/oauth/callback")
async def oauth_callback(req: Request):
    """Handle OAuth callback completion"""

    body = await req.json()
    user_id = body.get("userId")

    arcade = Arcade(api_key=os.getenv("ARCADE_API_KEY"))

    # Load available tools after authorization
    docs_tools = await arcade.tools.list(toolkit="google_docs", user_id=user_id)
    drive_tools = await arcade.tools.list(toolkit="google_drive", user_id=user_id)

    return JSONResponse({
        "success": True,
        "available_tools": len(docs_tools.items) + len(drive_tools.items)
    })

Optimize Performance

Implement Document Caching

from collections import OrderedDict
import time

class DocumentCache:
    """LRU cache for document content"""

    def __init__(self, max_size: int = 100, ttl_seconds: int = 3600):
        self.cache = OrderedDict()
        self.max_size = max_size
        self.ttl_seconds = ttl_seconds

    def get(self, doc_id: str):
        """Retrieve cached document"""
        if doc_id not in self.cache:
            return None

        content, timestamp = self.cache[doc_id]

        # Check expiration
        if time.time() - timestamp > self.ttl_seconds:
            del self.cache[doc_id]
            return None

        # Move to end (recently used)
        self.cache.move_to_end(doc_id)
        return content

    def set(self, doc_id: str, content: str):
        """Cache document content"""
        if doc_id in self.cache:
            self.cache.move_to_end(doc_id)

        self.cache[doc_id] = (content, time.time())

        # Evict oldest if full
        if len(self.cache) > self.max_size:
            self.cache.popitem(last=False)

Batch Document Processing

async def batch_process_documents(
    self,
    user_id: str,
    document_ids: List[str],
    batch_size: int = 5
) -> List[str]:
    """Process documents in batches"""

    results = []

    for i in range(0, len(document_ids), batch_size):
        batch = document_ids[i:i + batch_size]

        # Process batch concurrently
        tasks = [
            self.get_document_content(user_id, doc_id)
            for doc_id in batch
        ]

        batch_results = await asyncio.gather(*tasks, return_exceptions=True)

        # Filter exceptions
        valid_results = [
            r for r in batch_results
            if not isinstance(r, Exception)
        ]

        results.extend(valid_results)

    return results

Deploy to Production

Security Configuration

Store credentials securely:

import keyring
from cryptography.fernet import Fernet

class SecureCredentialManager:
    """Secure credential storage"""

    def __init__(self):
        encryption_key = os.getenv("ENCRYPTION_KEY")
        if not encryption_key:
            raise ValueError("ENCRYPTION_KEY environment variable required")

        self.fernet = Fernet(encryption_key.encode())

    def store_token(self, user_id: str, token: str):
        """Store encrypted token"""
        encrypted = self.fernet.encrypt(token.encode())
        keyring.set_password("fda_agent", user_id, encrypted.decode())

    def retrieve_token(self, user_id: str) -> str:
        """Retrieve decrypted token"""
        encrypted = keyring.get_password("fda_agent", user_id)
        if not encrypted:
            return None
        return self.fernet.decrypt(encrypted.encode()).decode()

Implement Monitoring

import logging
from datetime import datetime

class AgentMonitor:
    """Structured logging for agent operations"""

    def __init__(self):
        self.logger = logging.getLogger("fda_agent")
        self.logger.setLevel(logging.INFO)

        handler = logging.StreamHandler()
        formatter = logging.Formatter(
            '%(asctime)s - %(levelname)s - %(message)s'
        )
        handler.setFormatter(formatter)
        self.logger.addHandler(handler)

    def log_autofill(self, user_id: str, form_type: str, status: str):
        """Log form autofill operation"""
        self.logger.info(
            f"Autofill | User: {user_id} | Form: {form_type} | Status: {status}"
        )

    def log_document_access(self, user_id: str, doc_id: str, action: str):
        """Log document access for audit trail"""
        self.logger.info(
            f"Document | User: {user_id} | Doc: {doc_id} | Action: {action} | "
            f"Time: {datetime.now().isoformat()}"
        )

Configure Rate Limiting

from datetime import datetime, timedelta
from collections import defaultdict

class RateLimiter:
    """Rate limiting for API operations"""

    def __init__(self, max_calls: int = 100, window_minutes: int = 60):
        self.max_calls = max_calls
        self.window = timedelta(minutes=window_minutes)
        self.calls = defaultdict(list)

    def check_limit(self, user_id: str) -> bool:
        """Check if user exceeded rate limit"""
        now = datetime.now()

        # Remove expired calls
        self.calls[user_id] = [
            call_time for call_time in self.calls[user_id]
            if now - call_time < self.window
        ]

        # Check limit
        if len(self.calls[user_id]) >= self.max_calls:
            return False

        # Record call
        self.calls[user_id].append(now)
        return True

Test Your Agent

Unit Tests

import pytest
from unittest.mock import AsyncMock, MagicMock

@pytest.fixture
def mock_client():
    """Mock Arcade client"""
    client = AsyncMock()
    client.auth.start = AsyncMock(return_value=MagicMock(status="completed"))
    client.tools.execute = AsyncMock()
    return client

@pytest.mark.asyncio
async def test_document_search(mock_client):
    """Test document search functionality"""
    autofiller = FDAFormAutofiller()
    autofiller.client = mock_client

    # Mock search results
    mock_client.tools.execute.return_value = MagicMock(
        output={"files": [{"id": "doc1", "name": "Device Spec"}]}
    )

    results = await autofiller.search_source_documents(
        "test_user",
        "device specification"
    )

    assert len(results) == 1
    assert results[0]["name"] == "Device Spec"

@pytest.mark.asyncio
async def test_validation():
    """Test device information validation"""
    autofiller = FDAFormAutofiller()

    # Invalid device info
    invalid_info = DeviceInformation(
        device_name="",
        manufacturer="Test Corp",
        classification_name="Class II",
        product_code="AB",  # Invalid length
        intended_use="Test",
        indications_for_use="Test",
        predicate_device="123",  # Invalid format
        technological_characteristics=[]
    )

    validation = autofiller.validate_device_information(invalid_info)

    assert not validation["valid"]
    assert len(validation["errors"]) > 0

Best Practices

Form Template Management

  • Store templates in version-controlled Drive folder
  • Use naming convention: FDA_[FormNumber]_[FormName]_[ExpirationDate].gdoc
  • Implement template version checking
  • Set alerts for form expiration dates

Data Validation

  • Cross-reference device names with FDA database
  • Verify predicate 510(k) numbers
  • Validate date formats
  • Check character limits per form specifications

Audit Trail Requirements

  • Record all document accesses with timestamps
  • Log all form modifications
  • Track source documents used for autofill
  • Store original and modified versions

User Review Workflow

  • Never submit forms without human review
  • Highlight autofilled sections
  • Provide source document links
  • Enable inline commenting

Conclusion

This FDA form autofill agent shows how Arcade's Google Docs and Drive toolkits automate regulatory workflows. Arcade handles OAuth authentication, token management, and secure API access, letting you focus on agent logic.

The agent provides:

  • Secure multi-user authentication
  • Intelligent document search
  • Automated information extraction
  • Form population with validation
  • Error handling and retry logic
  • Production-ready security

Regulatory teams reduce form preparation time from hours to minutes while maintaining accuracy and audit trails. The architecture scales across different FDA form types by adjusting extraction schemas and templates.

For production deployment, self-host Arcade for enhanced control over authentication flows and data residency.

Resources

SHARE THIS POST

RECENT ARTICLES

Rays decoration image
THOUGHT LEADERSHIP

How to Query Postgres from GPT-5 via Arcade (MCP)

Large language models need structured data access to provide accurate, data-driven insights. This guide demonstrates how to connect GPT-5 to PostgreSQL databases through Arcade's Model Context Protocol implementation, enabling secure database queries without exposing credentials directly to language models. Prerequisites Before implementing database connectivity, ensure you have: * Python 3.8 or higher installed * PostgreSQL database with connection credentials * Arcade API key (free t

Rays decoration image
THOUGHT LEADERSHIP

How to Connect GPT-5 to Slack with Arcade (MCP)

Building AI agents that interact with Slack requires secure OAuth authentication, proper token management, and reliable tool execution. This guide shows you how to connect GPT-5 to Slack using Arcade's Model Context Protocol (MCP) implementation, enabling your agents to send messages, read conversations, and manage channels with production-grade security. Prerequisites Before starting, ensure you have: * Arcade.dev account with API key * Python 3.10+ or Node.js 18+ installed * OpenAI A

Rays decoration image
THOUGHT LEADERSHIP

How to Build a GPT-5 Gmail Agent with Arcade (MCP)

Building AI agents that can access and act on Gmail data represents a significant challenge in production environments. This guide demonstrates how to build a fully functional Gmail agent using OpenAI's latest models through Arcade's Model Context Protocol implementation, enabling secure OAuth-based authentication and real-world email operations. Prerequisites Before starting, ensure you have: * Active Arcade.dev account with API key * Python 3.10 or higher installed * OpenAI API key w

Blog CTA Icon

Get early access to Arcade, and start building now.