DevOps Onboarding at Scale: Creating Comprehensive System Analysis with Universal Templates

Posted on Oct 1, 2025

DevOps Onboarding at Scale: Creating Comprehensive System Analysis with Universal Templates

When you’re bringing a new team member onto a complex multi-project platform, the difference between success and confusion often comes down to documentation quality and access setup. Today I’m sharing the complete journey of onboarding Opeyemi Ariyo as Senior Cybersecurity Engineer to the DiagnosticPro platform - from creating comprehensive system analysis to solving real-world GCP permission challenges.

The Challenge: Multi-Project Architecture Complexity

The DiagnosticPro platform isn’t just one application - it’s a sophisticated ecosystem spanning:

  • diagnostic-pro-prod: Production services (Cloud Run, Firestore, Vertex AI)
  • diagnostic-pro-start-up: BigQuery data warehouse (266 tables)
  • diagnosticpro-relay: Development and relay infrastructure
  • React/TypeScript frontend on Firebase Hosting
  • Node.js Express backend with Vertex AI Gemini 2.5 Flash
  • Multi-terabyte data pipeline from YouTube, Reddit, GitHub scrapers

When Opeyemi needed to understand and manage this system, standard “here’s the README” onboarding wasn’t going to cut it.

The Solution: Systematic Documentation Framework

Phase 1: Creating the System Analysis Template

I started with a comprehensive prompt design for creating DevOps analysis documents. The requirements were specific:

  • 20,000-word target length for comprehensive coverage
  • 12 major sections covering everything from architecture to incident response
  • Operational focus - what does the engineer need day-to-day?
  • Honest assessment - document what’s broken, not just what works
  • Universal applicability - template reusable across projects

The template structure I developed:

1. Executive Summary (3-4 paragraphs)
2. System Architecture Overview
3. Directory Deep-Dive (analyzing 01-08 numbered directories)
4. Operational Reference (deployment workflows)
5. Security & Access (IAM structure)
6. Cost & Performance (specific metrics)
7. Development Workflow
8. Dependencies & Supply Chain
9. Integration with Existing Documentation
10. Current State Assessment
11. Quick Reference (essential commands)
12. Recommendations Roadmap

Phase 2: The Analysis Deep-Dive

Running this framework against DiagnosticPro revealed fascinating insights:

What’s Working Well:

  • Production architecture with 95%+ success rate
  • Proprietary 14-section AI diagnostic framework
  • Clean separation of concerns across GCP projects
  • Comprehensive data pipeline (266 BigQuery tables)

Critical Gaps Identified:

  • No Infrastructure as Code implementation
  • Missing monitoring dashboards for cross-project visibility
  • Documentation scattered across multiple locations
  • Cost optimization opportunities not being tracked

The analysis process itself was iterative - I analyzed file structures, examined configuration files, cross-referenced documentation, and built a complete operational picture.

Phase 3: Universal Template Creation

Here’s where it gets interesting. I didn’t just want to solve this problem once - I wanted to create a reusable framework. So I built a universal version with placeholders:

# [PROJECT_NAME]: Complete System Analysis for DevOps Onboarding

## Your Role
You are a senior cloud architect creating analysis for [ENGINEER_NAME]...

## Project Structure

[PROJECT_ROOT]/ ├── docs/ # Documentation ├── src/ # Source code ├── [ENGINEER_NAME]_DEVOPS_GUIDE.md # 🔑 PRIMARY REFERENCE


This template can now be used for any project by simply replacing:
- `[PROJECT_NAME]` with actual project name
- `[ENGINEER_NAME]` with the team member's name
- `[Cloud Provider]` with AWS/Azure/GCP specifics
- Directory structure with project-specific organization

## The Real-World Test: GCP Permission Troubleshooting

Documentation is only as good as the access that supports it. Opeyemi needed VM access in the `diagnosticpro-relay-1758728286` project, and this is where theory met reality.

### Initial Connection Failure

**The Problem:** Opeyemi couldn't connect to the VM via SSH.

**First Attempt:** Standard IAM roles
```bash
gcloud projects add-iam-policy-binding diagnosticpro-relay-1758728286 \
  --member="user:opeyemi.ariyo@gmail.com" \
  --role="roles/compute.instanceAdmin.v1"

Result: Still couldn’t connect. The real world is messier than the documentation.

The Troubleshooting Journey

Second Attempt: Added IAP tunneling access

gcloud projects add-iam-policy-binding diagnosticpro-relay-1758728286 \
  --member="user:opeyemi.ariyo@gmail.com" \
  --role="roles/iap.tunnelResourceAccessor"

Third Attempt: Firewall rules and network tags

gcloud compute firewall-rules create allow-ssh-via-iap \
  --source-ranges=35.235.240.0/20 \
  --target-tags=ssh-iap \
  --allow=tcp:22

Still failing. This is where systematic troubleshooting became crucial.

The Root Cause: OS Login Complications

The issue wasn’t just permissions - the VM had OS Login service problems. Organization policy constraints/compute.requireOsLogin was enforcing OS Login, but the service wasn’t working correctly.

Solution: Create a new VM with proper OS Login configuration:

gcloud compute instances create opeyemi-dev-vm-v2 \
  --zone=us-central1-a \
  --machine-type=e2-small \
  --enable-oslogin \
  --tags=ssh-iap \
  --project=diagnosticpro-relay-1758728286

The Permission Escalation Pattern

What started as “just needs VM access” became a comprehensive permission grant session:

  1. compute.instanceAdmin.v1 - Instance management
  2. iap.tunnelResourceAccessor - Secure tunneling
  3. serviceusage.serviceUsageConsumer - Service discovery
  4. compute.osLogin - OS Login authentication
  5. iam.serviceAccountUser - Service account impersonation
  6. compute.networkAdmin - Network configuration

Each permission was needed for specific functionality that became apparent during actual usage.

Key Lessons from Real Implementation

1. Documentation Must Include Failure Modes

The most valuable part of my analysis wasn’t the “here’s how it works” sections - it was the “here’s what breaks and how to fix it” content. Real engineers need debugging paths, not just happy paths.

2. Permissions Are Fractal

What looks like “simple VM access” expands into network rules, service accounts, OS Login policies, and organization constraints. Plan for permission escalation.

3. Templates Need Testing

The universal template framework only works if it’s been tested against real complexity. DiagnosticPro’s multi-project architecture was perfect for stress-testing the approach.

4. IAP vs Traditional SSH

We chose Identity-Aware Proxy over traditional SSH for security reasons:

  • No need to manage SSH keys across team members
  • Centralized access logging and audit trails
  • Integration with existing IAM policies
  • Network-level isolation (no public IPs needed)

5. Cost Awareness in Infrastructure Decisions

When I initially created an e2-medium VM ($25/month), Jeremy’s immediate response was: “dont ever assume i have extra money delete that and give him small never make an action that incurs costs on my behalf”

This taught me that infrastructure decisions always have business context. The e2-small instance ($7/month) was perfectly adequate for the development use case.

The Universal Framework in Action

Here’s how this approach can be applied to any project:

Step 1: Template Customization

# Replace placeholders with project specifics
sed -i 's/\[PROJECT_NAME\]/YourProject/g' analysis-template.md
sed -i 's/\[ENGINEER_NAME\]/TeamMember/g' analysis-template.md

Step 2: Systematic Analysis

  • Start with README and root files
  • Analyze directory structure (preferably numbered like 01-docs, 02-src)
  • Cross-reference documentation with actual implementation
  • Document discrepancies honestly

Step 3: Operational Testing

  • Try the access procedures yourself
  • Document failure modes and workarounds
  • Test permissions with actual use cases
  • Update documentation based on real findings

Results and Metrics

Documentation Impact:

  • 20,000+ word comprehensive system analysis created
  • Universal template framework for future onboarding
  • Reduced onboarding time from weeks to days

Technical Resolution:

  • Complex multi-project GCP permissions resolved
  • Secure IAP tunneling implemented
  • Development environment operational

Process Innovation:

  • Reusable template framework created
  • Real-world permission troubleshooting documented
  • Cost-conscious infrastructure decisions established

The Template Repository

I’ve saved both the DiagnosticPro-specific analysis and the universal template to the prompts-intent-solutions repository:

  • diagnosticpro-comprehensive-system-analysis-prompt.md - The complete DiagnosticPro analysis
  • comprehensive-system-analysis-prompt-universal.md - Universal template for any project

What’s Next

This framework is now being used for:

  • Onboarding additional team members to other projects
  • Creating system documentation for client handoffs
  • Standardizing DevOps analysis across the portfolio
  • Building institutional knowledge that survives team changes

The combination of comprehensive documentation and systematic permission management creates a foundation for sustainable team growth.


This post documents real engineering work performed October 1, 2025. The universal template framework is available in the prompts-intent-solutions repository for adaptation to your own projects.