DevOps Onboarding at Scale: Creating Comprehensive System Analysis with Universal Templates

Posted on Oct 1, 2025

DevOps Onboarding at Scale: Creating Comprehensive System Analysis with Universal Templates

When you’re bringing a new team member onto a complex multi-project platform, the difference between success and confusion often comes down to documentation quality and access setup. Today I’m sharing the complete journey of onboarding Opeyemi Ariyo as Senior Cybersecurity Engineer to the DiagnosticPro platform - from creating comprehensive system analysis to solving real-world GCP permission challenges.

The Challenge: Multi-Project Architecture Complexity

The DiagnosticPro platform isn’t just one application - it’s a sophisticated ecosystem spanning:

diagnostic-pro-prod: Production services (Cloud Run, Firestore, Vertex AI)
diagnostic-pro-start-up: BigQuery data warehouse (266 tables)
diagnosticpro-relay: Development and relay infrastructure
React/TypeScript frontend on Firebase Hosting
Node.js Express backend with Vertex AI Gemini 2.5 Flash
Multi-terabyte data pipeline from YouTube, Reddit, GitHub scrapers

When Opeyemi needed to understand and manage this system, standard “here’s the README” onboarding wasn’t going to cut it.

The Solution: Systematic Documentation Framework

Phase 1: Creating the System Analysis Template

I started with a comprehensive prompt design for creating DevOps analysis documents. The requirements were specific:

20,000-word target length for comprehensive coverage
12 major sections covering everything from architecture to incident response
Operational focus - what does the engineer need day-to-day?
Honest assessment - document what’s broken, not just what works
Universal applicability - template reusable across projects

The template structure I developed:

1. Executive Summary (3-4 paragraphs)
2. System Architecture Overview
3. Directory Deep-Dive (analyzing 01-08 numbered directories)
4. Operational Reference (deployment workflows)
5. Security & Access (IAM structure)
6. Cost & Performance (specific metrics)
7. Development Workflow
8. Dependencies & Supply Chain
9. Integration with Existing Documentation
10. Current State Assessment
11. Quick Reference (essential commands)
12. Recommendations Roadmap

Phase 2: The Analysis Deep-Dive

Running this framework against DiagnosticPro revealed fascinating insights:

What’s Working Well:

Production architecture with 95%+ success rate
Proprietary 14-section AI diagnostic framework
Clean separation of concerns across GCP projects
Comprehensive data pipeline (266 BigQuery tables)

Critical Gaps Identified:

No Infrastructure as Code implementation
Missing monitoring dashboards for cross-project visibility
Documentation scattered across multiple locations
Cost optimization opportunities not being tracked

The analysis process itself was iterative - I analyzed file structures, examined configuration files, cross-referenced documentation, and built a complete operational picture.

Phase 3: Universal Template Creation

Here’s where it gets interesting. I didn’t just want to solve this problem once - I wanted to create a reusable framework. So I built a universal version with placeholders:

# [PROJECT_NAME]: Complete System Analysis for DevOps Onboarding

## Your Role
You are a senior cloud architect creating analysis for [ENGINEER_NAME]...

## Project Structure

[PROJECT_ROOT]/ ├── docs/ # Documentation ├── src/ # Source code ├── [ENGINEER_NAME]_DEVOPS_GUIDE.md # 🔑 PRIMARY REFERENCE


This template can now be used for any project by simply replacing:
- `[PROJECT_NAME]` with actual project name
- `[ENGINEER_NAME]` with the team member's name
- `[Cloud Provider]` with AWS/Azure/GCP specifics
- Directory structure with project-specific organization

## The Real-World Test: GCP Permission Troubleshooting

Documentation is only as good as the access that supports it. Opeyemi needed VM access in the `diagnosticpro-relay-1758728286` project, and this is where theory met reality.

### Initial Connection Failure

**The Problem:** Opeyemi couldn't connect to the VM via SSH.

**First Attempt:** Standard IAM roles
```bash
gcloud projects add-iam-policy-binding diagnosticpro-relay-1758728286 \
  --member="user:opeyemi.ariyo@gmail.com" \
  --role="roles/compute.instanceAdmin.v1"

Result: Still couldn’t connect. The real world is messier than the documentation.

The Troubleshooting Journey

Second Attempt: Added IAP tunneling access

gcloud projects add-iam-policy-binding diagnosticpro-relay-1758728286 \
  --member="user:opeyemi.ariyo@gmail.com" \
  --role="roles/iap.tunnelResourceAccessor"

Third Attempt: Firewall rules and network tags

gcloud compute firewall-rules create allow-ssh-via-iap \
  --source-ranges=35.235.240.0/20 \
  --target-tags=ssh-iap \
  --allow=tcp:22

Still failing. This is where systematic troubleshooting became crucial.

The issue wasn’t just permissions - the VM had OS Login service problems. Organization policy constraints/compute.requireOsLogin was enforcing OS Login, but the service wasn’t working correctly.

Solution: Create a new VM with proper OS Login configuration:

gcloud compute instances create opeyemi-dev-vm-v2 \
  --zone=us-central1-a \
  --machine-type=e2-small \
  --enable-oslogin \
  --tags=ssh-iap \
  --project=diagnosticpro-relay-1758728286

The Permission Escalation Pattern

What started as “just needs VM access” became a comprehensive permission grant session:

compute.instanceAdmin.v1 - Instance management
iap.tunnelResourceAccessor - Secure tunneling
serviceusage.serviceUsageConsumer - Service discovery
compute.osLogin - OS Login authentication
iam.serviceAccountUser - Service account impersonation
compute.networkAdmin - Network configuration

Each permission was needed for specific functionality that became apparent during actual usage.

Key Lessons from Real Implementation

1. Documentation Must Include Failure Modes

The most valuable part of my analysis wasn’t the “here’s how it works” sections - it was the “here’s what breaks and how to fix it” content. Real engineers need debugging paths, not just happy paths.

2. Permissions Are Fractal

What looks like “simple VM access” expands into network rules, service accounts, OS Login policies, and organization constraints. Plan for permission escalation.

3. Templates Need Testing

The universal template framework only works if it’s been tested against real complexity. DiagnosticPro’s multi-project architecture was perfect for stress-testing the approach.

4. IAP vs Traditional SSH

We chose Identity-Aware Proxy over traditional SSH for security reasons:

No need to manage SSH keys across team members
Centralized access logging and audit trails
Integration with existing IAM policies
Network-level isolation (no public IPs needed)

5. Cost Awareness in Infrastructure Decisions

When I initially created an e2-medium VM ($25/month), Jeremy’s immediate response was: “dont ever assume i have extra money delete that and give him small never make an action that incurs costs on my behalf”

This taught me that infrastructure decisions always have business context. The e2-small instance ($7/month) was perfectly adequate for the development use case.

The Universal Framework in Action

Here’s how this approach can be applied to any project:

Step 1: Template Customization

# Replace placeholders with project specifics
sed -i 's/\[PROJECT_NAME\]/YourProject/g' analysis-template.md
sed -i 's/\[ENGINEER_NAME\]/TeamMember/g' analysis-template.md

Step 2: Systematic Analysis

Start with README and root files
Analyze directory structure (preferably numbered like 01-docs, 02-src)
Cross-reference documentation with actual implementation
Document discrepancies honestly

Step 3: Operational Testing

Try the access procedures yourself
Document failure modes and workarounds
Test permissions with actual use cases
Update documentation based on real findings

Results and Metrics

Documentation Impact:

20,000+ word comprehensive system analysis created
Universal template framework for future onboarding
Reduced onboarding time from weeks to days

Technical Resolution:

Complex multi-project GCP permissions resolved
Secure IAP tunneling implemented
Development environment operational

Process Innovation:

Reusable template framework created
Real-world permission troubleshooting documented
Cost-conscious infrastructure decisions established

The Template Repository

I’ve saved both the DiagnosticPro-specific analysis and the universal template to the prompts-intent-solutions repository:

diagnosticpro-comprehensive-system-analysis-prompt.md - The complete DiagnosticPro analysis
comprehensive-system-analysis-prompt-universal.md - Universal template for any project

What’s Next

This framework is now being used for:

Onboarding additional team members to other projects
Creating system documentation for client handoffs
Standardizing DevOps analysis across the portfolio
Building institutional knowledge that survives team changes

The combination of comprehensive documentation and systematic permission management creates a foundation for sustainable team growth.

Building AI-Friendly Codebase Documentation: A Real-Time CLAUDE.md Creation Journey - Deep dive into AI-assisted documentation creation
Waygate MCP v2.1.0: From Forensic Analysis to Production Enterprise Server - Systematic infrastructure analysis and resolution
Comprehensive Technical Guide to SSH, Debian Packages, and Grep - Technical reference for Linux system administration

This post documents real engineering work performed October 1, 2025. The universal template framework is available in the prompts-intent-solutions repository for adaptation to your own projects.

DevOps Onboarding at Scale: Creating Comprehensive System Analysis with Universal Templates

DevOps Onboarding at Scale: Creating Comprehensive System Analysis with Universal Templates

The Challenge: Multi-Project Architecture Complexity

The Solution: Systematic Documentation Framework

Phase 1: Creating the System Analysis Template

Phase 2: The Analysis Deep-Dive

Phase 3: Universal Template Creation

The Troubleshooting Journey

The Root Cause: OS Login Complications

The Permission Escalation Pattern

Key Lessons from Real Implementation

1. Documentation Must Include Failure Modes

2. Permissions Are Fractal

3. Templates Need Testing

4. IAP vs Traditional SSH

5. Cost Awareness in Infrastructure Decisions

The Universal Framework in Action

Results and Metrics

The Template Repository

What’s Next

Related Posts