Debugging Slack Integration: From 6 Duplicate Responses to Instant Acknowledgment

Posted on Oct 9, 2025

The Problem: Bob Responded 6 Times to Every Message

I integrated my AI agent (Bob’s Brain) with Slack, and it worked—sort of. Every time I sent a message, Bob responded six times with the exact same answer. The Cloudflare Tunnel logs showed constant timeout errors:

2025-10-09T08:12:20Z ERR Request failed error="Incoming request ended abruptly: context canceled"

This wasn’t a “minor bug”—this was a production-breaking issue that made the integration unusable.

The Journey: What Actually Happened

Starting Point: Unstable Tunnels

Before we even got to Slack, we had tunnel stability issues:

localhost.run kept changing URLs:

cf011aadb6f85d.lhr.life
0ca4fddc58e906.lhr.life
7aa0d045663613.lhr.life

Every URL change required updating Slack Event Subscriptions. Not sustainable.

Solution: Switched to Cloudflare Tunnel (cloudflared)

Free, no account required for testing
Stable URL: https://editor-steering-width-innovation.trycloudflare.com
Persists as long as the process runs

# Install cloudflared
curl -sLO https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
sudo dpkg -i cloudflared-linux-amd64.deb

# Start tunnel in background
nohup cloudflared tunnel --url http://localhost:8080 > /tmp/cloudflared.log 2>&1 &

Side Quest: LlamaIndex API Migration

While setting up Slack, Bob’s Knowledge Orchestrator was throwing deprecation warnings:

# OLD (deprecated)
from llama_index.core import ServiceContext, set_global_service_context
service_context = ServiceContext.from_defaults(llm=llm, chunk_size=512)
set_global_service_context(service_context)

# NEW (Settings API)
from llama_index.core import Settings
Settings.llm = llm
Settings.chunk_size = 512

Why this mattered: Bob integrates three knowledge sources (653MB Knowledge DB, Analytics DB, Research index). The deprecation was blocking clean initialization.

Result after fix:

✅ Knowledge orchestrator initialized successfully

The Main Problem: Slack’s 3-Second Timeout

Slack verified the webhook URL successfully. Bob started responding to messages. But every message triggered 6 duplicate responses.

Initial code flow:

Slack sends webhook event
Bob processes entire LLM query (10-60 seconds)
Bob sends Slack message
Bob returns HTTP 200

Slack’s behavior:

Waits 3 seconds for HTTP 200
No response? Retry the event
Keeps retrying until it gets acknowledgment
Result: 4-6 duplicate event deliveries

The Debugging Process

First attempt: “Maybe it’s the tunnel?”

Checked tunnel logs: Connection stable
Tested endpoint locally: curl http://localhost:8080/slack/events → Works fine

Second attempt: “Maybe it’s LLM response time?”

Ollama (local): 5-15 seconds
Groq (cloud): 2-8 seconds
Even fastest responses exceeded Slack’s 3-second window

Root cause identified:

@app.post("/slack/events")
def slack_events():
    # ... validation ...

    # ❌ PROBLEM: This takes 10-60 seconds
    answer = llm(prompt)
    slack_client.chat_postMessage(channel=channel, text=answer)

    # By the time we return HTTP 200, Slack has retried 4-6 times
    return jsonify({"ok": True})

The Solution: Immediate Acknowledgment + Background Processing

Key insight: Slack doesn’t need to wait for the LLM response. It just needs to know we received the event.

Implementation

1. Create background processing function:

_slack_event_cache = {}  # Deduplication cache

def _process_slack_message(text, channel, user, event_id):
    """Background processing - can take as long as needed"""
    try:
        # 1. Check cache
        cached = get_cached_llm_response(text)
        if cached:
            slack_client.chat_postMessage(channel=channel, text=cached['answer'])
            return

        # 2. Get conversation history
        history = get_conversation_history(user, limit=10)

        # 3. Route to optimal LLM
        routing = ROUTER.route(text)

        # 4. Query knowledge bases if complex
        knowledge_context = ""
        if routing['complexity'] > 0.3:
            knowledge_context = KNOWLEDGE.query(text, mode='auto')

        # 5. Generate answer
        llm = llm_client()
        prompt = build_conversation_prompt(history, text, knowledge_context)
        answer = llm(prompt)

        # 6. Send to Slack (no rush, we're in background)
        slack_client.chat_postMessage(
            channel=channel,
            text=f"{answer}\n\n_[via {routing['provider']}]_"
        )

        # 7. Cache and learn
        cache_llm_response(text, answer, ttl=3600)
        add_to_conversation(user, "user", text)
        add_to_conversation(user, "assistant", answer)
        COL.run_once([{"type": "slack_message", ...}])

    finally:
        # Cleanup dedup cache after 60 seconds
        threading.Timer(60, lambda: _slack_event_cache.pop(event_id, None)).start()

2. Modify webhook handler to return immediately:

@app.post("/slack/events")
def slack_events():
    payload = request.get_json(silent=True) or {}

    # Handle URL verification
    if payload.get("type") == "url_verification":
        return jsonify({"challenge": payload.get("challenge")})

    event = payload.get("event", {})
    event_id = payload.get("event_id", "")

    # ✅ CRITICAL: Deduplicate retries
    if event_id and event_id in _slack_event_cache:
        log.info(f"Ignoring duplicate event: {event_id}")
        return jsonify({"ok": True})

    if event_id:
        _slack_event_cache[event_id] = True

    # Validate event
    if event.get("bot_id") or event.get("type") not in ["message", "app_mention"]:
        return jsonify({"ok": True})

    text = event.get("text", "")
    channel = event.get("channel")
    user = event.get("user")

    if not text or not channel:
        return jsonify({"ok": True})

    # ✅ SOLUTION: Spawn background thread
    thread = threading.Thread(
        target=_process_slack_message,
        args=(text, channel, user, event_id),
        daemon=True
    )
    thread.start()

    # ✅ Return HTTP 200 immediately (< 100ms)
    log.info(f"Queued Slack message for background processing")
    return jsonify({"ok": True})

Why This Works

Before:

Slack → Webhook → Process (10-60s) → HTTP 200
Slack timeout → Retry → Process again → HTTP 200
Result: 6 responses

After:

Slack → Webhook → HTTP 200 (< 100ms)
Background: Process → Send Slack message
Deduplication: Retries ignored via event_id cache
Result: 1 response

Results

Performance:

HTTP 200 acknowledgment: < 100ms (was 10-60 seconds)
No more Cloudflare timeout errors
One message in → One response out

Testing:

# Before fix
User: "Hey Bob"
Bob: [response 1]
Bob: [response 2]
Bob: [response 3]
Bob: [response 4]
Bob: [response 5]
Bob: [response 6]

# After fix
User: "Hey Bob"
Bob: [response]  ✓

Bonus: DiagPro Training

While debugging, I also trained Bob on a 19,000-word DiagPro customer avatar document using the /learn endpoint:

curl -X POST http://localhost:8080/learn \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $BB_API_KEY" \
  -d '{
    "correction": "DiagPro is a $4.99 AI-powered automotive diagnostic platform targeting drivers aged 25-60 who fear being overcharged by mechanics..."
  }'

Bob’s Circle of Life learning system processes this knowledge and makes it available for queries through the Knowledge Orchestrator.

Key Lessons

Webhook timeout limits are real - Slack’s 3-second timeout isn’t negotiable
Background processing is essential - Don’t make the HTTP client wait for slow operations
Deduplication is critical - Retries WILL happen; handle them gracefully
Event IDs exist for a reason - Use them to detect duplicate deliveries
Tunnel stability matters - Cloudflare Tunnel »> localhost.run for production use

Tech Stack

Python 3.12 with Flask
Slack SDK for Python
Cloudflare Tunnel for public HTTPS
LlamaIndex for knowledge integration
Ollama (local), Groq, Google Gemini (cloud LLMs)
Redis for caching and conversation memory

Author: Jeremy Longshore Email: jeremy@intentsolutions.io GitHub: @jeremylongshore

Building production-grade AI agents with real-world integration lessons learned the hard way.