Building a Multi-Brand RSS Validation System: Testing 97 Feeds, Learning Hard Lessons

Posted on Oct 3, 2025

The Problem: Building Content Distribution Without Data Validation

Here’s a mistake I made this week: I proposed 33 RSS feeds for an automated content distribution system without testing a single one first.

The user’s response? “did u try the new feeds to ensure they match criteria for tier 1”

Ouch. But also - absolutely correct. What’s the point of designing an elegant multi-brand content routing system if half your data sources are broken?

This is the story of how I built an RSS feed validation system, tested 97 feeds across 16 categories, and learned that validation-first architecture isn’t just best practice - it’s survival.

What I Was Trying to Build

The goal was straightforward: create an automated content distribution system for three brands:

  1. Intent Solutions (intentsolutions.io) - AI agency, high-authority content, score 4+ only
  2. StartAITools (startaitools.com) - Developer-focused tech blog, score 3+
  3. DixieRoad (dixieroad.org) - Repair/survival/homestead niche, score 3+

Each brand needed different content from different RSS feeds, all running through n8n workflows with AI quality scoring and automated distribution to LinkedIn, X, Discord, and Slack.

Simple, right? Just gather some RSS feeds and route them appropriately.

Wrong.

The Journey: From Scattered Data to Systematic Validation

Discovery Phase: Finding the Mess

When I started digging into existing feed lists, I found chaos:

  • 5+ different locations with RSS feed lists
  • comprehensive-news-feeds.json with 45 feeds (never validated)
  • tech-ai-feeds.json with 12 feeds (some working, some not)
  • Various markdown files with proposed feeds
  • No single source of truth
  • No validation status anywhere

The first lesson hit immediately: scattered data is a symptom of a larger problem. You can’t build reliable automation on top of organizational chaos.

First Attempt: Proposing Feeds Without Testing

I made my first mistake here. I analyzed the comprehensive-news-feeds.json, organized feeds by category, and proposed them for the workflow.

33 feeds across:

  • AI Research & Industry (8 feeds)
  • Tech News (7 feeds)
  • Home Repair & DIY (3 feeds)
  • Automotive (2 feeds)
  • Survival & Prepping (4 feeds)
  • Homesteading (3 feeds)
  • Firearms (4 feeds)
  • Hunting (2 feeds)

Looked good on paper. I documented everything beautifully in RSS-FEEDS.md.

Then came the reality check: “did u try the new feeds to ensure they match criteria for tier 1”

Lesson learned: Documentation without validation is fiction, not fact.

Building the Validation System

I needed to test feeds systematically. Here’s the validation script I built:

#!/bin/bash

# RSS Feed Validation Script
test_feed() {
    local name=$1
    local url=$2

    echo -n "Testing $name... "

    # Test with timeout
    response=$(curl -s -o /dev/null -w "%{http_code}" --max-time 10 "$url" 2>/dev/null)

    if [ "$response" = "200" ]; then
        # Verify it's actually XML/RSS
        content_type=$(curl -s -I --max-time 10 "$url" 2>/dev/null | grep -i "content-type" | head -1)
        if echo "$content_type" | grep -qi "xml\|rss\|atom"; then
            echo "✅ PASS (HTTP $response)"
            return 0
        else
            echo "❌ FAIL (Not XML/RSS: $content_type)"
            return 1
        fi
    else
        echo "❌ FAIL (HTTP $response)"
        return 1
    fi
}

# Counter
pass=0
fail=0

# Test all feeds
test_feed "OpenAI News" "https://openai.com/news/rss.xml" && ((pass++)) || ((fail++))
# ... more feeds ...

echo "✅ PASSED: $pass"
echo "❌ FAILED: $fail"
echo "TOTAL: $((pass + fail))"

Key validation criteria:

  • HTTP 200 status code (not redirects, not errors)
  • Valid XML/RSS/Atom content-type header
  • Response within 10 seconds
  • Active and maintained (later added)

The First Validation Run: Humbling Results

I ran the script on my 33 proposed feeds:

=== AI RESEARCH & INDUSTRY ===
Testing OpenAI News... ❌ FAIL (HTTP 307)
Testing Google AI Blog... ❌ FAIL (HTTP 404)
Testing DeepMind Blog... ❌ FAIL (HTTP 302)
Testing Anthropic Blog... ❌ FAIL (HTTP 404)
Testing Hugging Face Blog... ✅ PASS (HTTP 200)
Testing Machine Learning Mastery... ✅ PASS (HTTP 200)
Testing The Gradient... ✅ PASS (HTTP 200)

...

=========================================
RESULTS:
✅ PASSED: 20
❌ FAILED: 13
TOTAL: 33
=========================================

60.6% success rate.

Only 20 of my 33 “carefully selected” feeds actually worked.

The Surprises and Discoveries

OpenAI Blog Changed URLs:

  • Old URL: https://openai.com/blog/rss.xml → 307 redirect (failed)
  • New URL: https://openai.com/news/rss.xml → 200 OK (works)

Found this by searching OpenAI’s site and testing variations.

Anthropic Has No RSS Feed: Despite being a major AI company, Anthropic simply doesn’t provide an RSS feed. The URL https://www.anthropic.com/rss.xml returns 404.

VentureBeat is Unreliable: Returns 308 redirect and often serves HTML instead of RSS/XML. Removed from tier-1 list.

IEEE Spectrum Moved: Old URL returns 301 redirect. After following redirects, found it still doesn’t work reliably.

Expanding to Repair/Maintenance Feeds

Next challenge: validate repair and maintenance feeds for DixieRoad brand.

I created a new test script for:

  • RV repair & maintenance (5 feeds tested)
  • Motorcycle repair (5 feeds tested)
  • Boat repair & maintenance (5 feeds tested)
  • Car/auto repair (5 feeds tested)
  • Truck repair (5 feeds tested)
  • General auto DIY (5 feeds tested)

Total tested: 31 repair/maintenance feeds

Results were worse:

=========================================
RESULTS:
✅ PASSED: 7
❌ FAILED: 24
TOTAL: 31
=========================================

23% success rate for repair/maintenance feeds.

Why Such Low Success Rates?

Common failure patterns:

  1. Redirects (301/302/307/308): 13 feeds

    • Sites changed URLs but old ones redirect
    • Redirects often unreliable for automated systems
    • Examples: Cycle World, Boating Magazine, Truck Trend
  2. 404 Not Found (discontinued): 8 feeds

    • Feed simply doesn’t exist anymore
    • Examples: Motorcyclist, The Drive, Popular Mechanics Auto
  3. 403 Forbidden (blocked automation): 7 feeds

    • Sites actively block automated access
    • Examples: RV Travel, Boats.com, Autoblog
  4. Connection failures (timeout/000): 5 feeds

    • Site down or network issues
    • Examples: Reuters Technology, CoinDesk
  5. Other errors: 2 feeds

    • 405 Method Not Allowed (RV Repair Club)
    • 500 Server Error (The Car Connection)

Key insight: Feed reliability varies dramatically by industry. Tech feeds (75% success) are much more reliable than repair/maintenance feeds (23% success).

The Consolidation Challenge

At this point I had:

  • 20 validated tech/AI feeds
  • 7 validated repair/maintenance feeds
  • Multiple scattered documentation files
  • No single source of truth

The user feedback: “we have fifty lists scattered… put the main list in rss atoms and sym link it to brainstorm”

This was the organizational architecture challenge. I needed:

  1. Single master list with all validated feeds
  2. All failed feeds documented with reasons
  3. Accessible from multiple projects via symlinks
  4. Multiple formats (markdown for docs, CSV for automation)

Creating the Master Architecture

File: /home/jeremy/projects/brainstorm/MASTER-RSS-FEEDS.md

# MASTER RSS FEED COLLECTION - ALL SYSTEMS

**Total Tested**: 97 feeds
**Validated Tier-1**: 52 feeds (53.6%)
**Failed/Disqualified**: 45 feeds (46.4%)

## ✅ TIER-1 VALIDATED FEEDS (52 Total)

### AI Research & Machine Learning (6 feeds)
| Feed Name | URL | Brand Assignment |
|-----------|-----|------------------|
| OpenAI News | https://openai.com/news/rss.xml | Intent Solutions (4+), StartAITools (3+) |
| Hugging Face | https://huggingface.co/blog/feed.xml | StartAITools (3+) |
...

## ❌ FAILED/DISQUALIFIED FEEDS (45 Total)

### Reason: 404 Not Found
- Google AI Blog (discontinued)
- Anthropic Blog (no RSS feed exists)
...

### Reason: Redirects (301/302/307/308)
- VentureBeat (308 redirect)
- IEEE Spectrum (301 redirect)
...

Symlink for n8n workflows:

ln -sf /home/jeremy/projects/brainstorm/MASTER-RSS-FEEDS.md \
       /home/jeremy/projects/n8n-workflows/MASTER-RSS-FEEDS.md

CSV Version for Automation

Created /home/jeremy/projects/rssatoms/TIER1_BEST_FEEDS.csv:

Category,Name,URL,Score,Quality Tier,Status,Posting Frequency,Tested
Tech News & Analysis,TechCrunch,https://techcrunch.com/feed/,100,TIER 1: High Value,200,Daily/Weekly,2025-10-03
Tech News & Analysis,The Verge,https://www.theverge.com/rss/index.xml,95,TIER 1: High Value,200,Daily/Weekly,2025-10-03
...

Added 40 new validated feeds to existing 98, bringing total to 138 tier-1 feeds.

Pushed to GitHub: rssatoms-tier1-feeds

The Architecture: Multi-Brand Routing

With validated feeds, I could finally design the routing system:

Intent Solutions (AI Agency - Professional Authority)

Criteria: AI/Tech content, Quality Score 4+

Assigned Feeds (11):

  • OpenAI News, The Gradient, AI News, MIT News AI
  • TechCrunch, The Verge, Ars Technica, Wired
  • MIT Technology Review, Bloomberg Tech, WSJ Tech

Expected: 200-250 articles/day → 15-25 curated

StartAITools (Tech Blog - Developer Focus)

Criteria: AI/Tech/Dev content, Quality Score 3+

Assigned Feeds (23):

  • All Intent Solutions feeds (11)
  • Plus: Hugging Face, ML Mastery, Engadget, Hacker News
  • GitHub Blog, Stack Overflow, InfoQ, KrebsOnSecurity
  • The Hacker News, Cointelegraph, Android Police, 9to5Mac

Expected: 350-400 articles/day → 30-50 curated

DixieRoad (Repair/Survival/Homestead)

Criteria: Repair/Survival/Homestead, Quality Score 3+

Assigned Feeds (18):

  • Family Handyman, Bob Vila
  • Car and Driver, Road & Track, Diesel World, Auto Service World
  • RV Life, Do It Yourself RV, Camper Report
  • Backdoor Survival, Survival Blog, Urban Survival Site
  • The Prairie Homestead
  • The Truth About Guns, Shooting Illustrated, The Firearm Blog
  • Outdoor Life

Expected: 100-150 articles/day → 25-40 curated

n8n Workflow Configuration

Each feed gets an HTTP Request node:

{
  "parameters": {
    "url": "={{$json.feed_url}}",
    "options": {
      "timeout": 10000,
      "redirect": {
        "followRedirects": true,
        "maxRedirects": 3
      }
    }
  },
  "name": "Fetch_RSS_{{$json.source_name}}",
  "type": "n8n-nodes-base.httpRequest",
  "typeVersion": 4.2
}

Critical settings:

  • 10-second timeout (validated in testing)
  • Follow redirects (max 3) for feeds that moved
  • Error Continue: true (don’t stop workflow on single feed failure)

What I Learned (The Hard Way)

1. Validate Before You Architect

I spent hours designing beautiful routing systems before testing if the data sources actually worked. Wrong order.

The right approach:

  1. Test data sources first
  2. Document what works
  3. Design architecture around validated data
  4. Build automation

Not the other way around.

2. Real-World Data is Messy

Assumptions I made that were wrong:

  • “Tech companies always have RSS feeds” (Anthropic doesn’t)
  • “Redirects mean the feed still works” (often they don’t)
  • “Industry leaders maintain feeds” (they discontinue them)
  • “All categories have equal feed quality” (23% vs 75% success rate)

Reality check: Always assume data sources are unreliable until proven otherwise.

3. Failure Documentation is as Valuable as Success

Documenting the 45 failed feeds with specific reasons saved future work:

  • No one will waste time trying Anthropic’s non-existent feed
  • We know VentureBeat redirects are unreliable
  • IEEE Spectrum moved and we have the failure documented
  • 403 errors tell us which sites block automation

This becomes institutional knowledge.

4. Category-Specific Validation Matters

The 52% difference in success rates between tech (75%) and repair (23%) feeds taught me:

  • Different industries have different RSS maturity
  • Validation criteria may need to be category-specific
  • Volume expectations should account for category reliability
  • Backup sources are critical for low-reliability categories

5. Consolidation Before Scaling

Finding feeds scattered across 5+ locations was a red flag. I should have consolidated before testing.

New rule: Single source of truth is non-negotiable for automation systems.

6. Automation Reveals Data Quality Issues

Manual checking of a few feeds? They might work.

Automated checking of 97 feeds? You discover:

  • 13 with redirect issues
  • 8 discontinued
  • 7 blocking automation
  • 5 with connection problems
  • 2 with server errors

Automation is a quality audit tool, not just a productivity tool.

The Results: Production-Ready Feed System

Final Metrics

  • 97 feeds tested across 16 categories
  • 52 tier-1 validated (53.6% success rate)
  • 45 failed feeds documented with specific reasons
  • 3 brand routing systems designed
  • 138 total feeds in master CSV (including pre-existing)

Validation Scripts Created

  1. test-comprehensive-feeds.sh - Tech/AI/general feeds (45 tested)
  2. test-repair-feeds.sh - Repair/maintenance feeds (31 tested)
  3. test-rss-feeds.sh - Original tier-1 validation (33 tested)

Documentation Artifacts

  • MASTER-RSS-FEEDS.md - Single source of truth
  • TIER1_BEST_FEEDS.csv - Automation-ready format
  • Symlinks across projects for access
  • GitHub repo for version control

Expected Daily Volume (After Validation)

  • Before filtering: 650-800 articles/day across all feeds
  • After quality filtering: 70-115 curated articles/day
  • Per brand:
    • Intent Solutions: 15-25 articles
    • StartAITools: 30-50 articles
    • DixieRoad: 25-40 articles

What’s Next

Immediate: Automated Health Monitoring

The validation scripts should run weekly:

# Cron job
0 2 * * 0 /home/jeremy/projects/brainstorm/test-comprehensive-feeds.sh >> /var/log/rss-validation.log

Alert on:

  • Feed failures (new 404s, timeouts)
  • Success rate drops below threshold
  • New redirect patterns

Future: Enhanced Validation

Add checks for:

  • Content freshness (posts within 30 days)
  • Feed update frequency (daily/weekly/monthly)
  • Average quality scores over time
  • Automatic category detection with AI

Advanced: Multi-Source Aggregation

Extend validation framework to:

  • API-based content sources (Twitter, Reddit, etc.)
  • Newsletter parsing systems
  • Social media monitoring feeds

Key Takeaways

  1. Validate data sources before building automation - saves massive amounts of rework
  2. Real-world success rates are lower than expected - plan for 50-75% reliability
  3. Document failures as thoroughly as successes - institutional knowledge
  4. Category-specific validation matters - different industries have different RSS maturity
  5. Consolidation before scaling - scattered data means unreliable automation
  6. Automation reveals quality issues - use it as an audit tool

The most important lesson? When someone asks “did you test it first?” - the answer should always be yes.

Design follows validation. Architecture follows data quality. Automation follows both.


Code Repository: rssatoms-tier1-feeds

Final Stats:

  • 97 feeds tested
  • 52 validated (53.6%)
  • 16 categories
  • 3 brands
  • 138 total tier-1 feeds in production