vendor_report/TASKBOARD_INTEGRATION_CONTEXT.md
2025-11-06 20:50:19 +04:00

14 KiB

Vendor Report Generator - Taskboard Integration Context

🎯 Goal & Purpose

The Vendor Report Generator is a Python-based tool designed to automate the generation of comprehensive vendor punchlist reports from Excel files stored in SharePoint. The goal is to:

  1. Automate Report Generation: Eliminate manual Excel processing and report creation
  2. Centralize Data: Pull vendor punchlist data directly from SharePoint
  3. Provide Insights: Generate actionable reports with metrics, priorities, and status tracking
  4. Enable Integration: Make reports accessible within Taskboard for team collaboration

Business Value

  • Time Savings: Automates hours of manual report generation
  • Accuracy: Consistent data normalization and calculation
  • Visibility: Real-time vendor status tracking and metrics
  • Accessibility: Web-based interface for non-technical users
  • Integration Ready: Can be embedded as a tool/widget in Taskboard

📋 Application Overview

What It Does

The application processes Excel files containing vendor punchlist items and generates:

  • Interactive HTML Reports: Searchable, filterable web reports with vendor tabs, status filters, and priority grouping
  • JSON Data: Structured data for further processing or API integration
  • Metrics: Per-vendor statistics (total items, closed/open counts, 24-hour updates, oldest unaddressed items)

Key Features

  1. Excel Processing: Direct pandas-based reading (no manual conversion needed)
  2. Data Normalization: Automatically handles vendor name variations, status inconsistencies, priority classifications
  3. 24-Hour Tracking: Identifies items added, closed, or changed to monitor status in the last 24 hours (Baltimore/Eastern timezone)
  4. Priority Classification: Groups items by Very High, High, Medium, Low priorities
  5. Oldest Items: Highlights the oldest 3 unaddressed items per vendor
  6. SharePoint Integration: Automatically downloads Excel files from SharePoint
  7. Scheduled Generation: Can run automatically on a schedule
  8. Web UI: User-friendly interface for generating reports
  9. REST API: Programmatic access for integration

🏗️ Architecture & Components

Core Components

vendor_report/
├── report_generator.py      # Main entry point - orchestrates report generation
├── data_preprocessor.py     # Excel parsing, normalization, data cleaning
├── html_generator.py        # Generates interactive HTML reports
├── models.py                # Pydantic data models for validation
├── sharepoint_downloader.py  # SharePoint file downloader
├── scheduler.py             # Scheduled report generation
├── api_server.py            # REST API server
├── web_ui.py                # Web UI server (Flask-based)
├── config.py                # Configuration management
└── config.yaml              # Configuration file

Data Flow

SharePoint Excel Files
    ↓
[SharePoint Downloader] → Local reports/ directory
    ↓
[Data Preprocessor] → Normalize vendors, statuses, priorities, parse dates
    ↓
[Report Generator] → Calculate metrics, group by vendor, identify updates
    ↓
[HTML Generator] → Generate interactive report.html
    ↓
[Output] → output/report.json + output/report.html

Processing Pipeline

  1. Input: Excel files with columns:

    • Punchlist Name, Vendor, Priority, Description, Date Identified, Status Updates, Issue Image, Status, Date Completed
  2. Preprocessing:

    • Parse Excel files using pandas
    • Normalize vendor names (handle case variations, combined vendors)
    • Normalize statuses (Complete, Monitor, Incomplete)
    • Classify priorities (Very High, High, Medium, Low)
    • Parse dates (multiple formats supported)
    • Calculate 24-hour windows (Baltimore/Eastern timezone)
    • Calculate item age (days since identified)
  3. Report Generation:

    • Group items by vendor
    • Calculate metrics per vendor (total, closed, open, monitor counts)
    • Identify 24-hour updates (added, closed, changed to monitor)
    • Find oldest 3 unaddressed items per vendor
    • Group by priority levels
    • Generate JSON structure
    • Generate HTML report
  4. Output:

    • output/report.json: Structured JSON data
    • output/report.html: Interactive HTML report
    • output/preprocessed_data.txt: Debug/preview data

🔧 Technical Details

Dependencies

# Core
pandas>=2.0.0          # Excel processing
openpyxl>=3.0.0         # Excel file reading
pydantic>=2.0.0         # Data validation

# Optional: SharePoint
Office365-REST-Python-Client>=2.3.0  # SharePoint API

# Optional: Scheduling
apscheduler>=3.10.0     # Task scheduling

# Optional: Web UI/API
flask>=2.3.0            # Web framework
flask-cors>=4.0.0       # CORS support

# Configuration
pyyaml>=6.0             # YAML config parsing

Configuration

Configuration is managed via config.yaml:

sharepoint:
  enabled: true/false
  site_url: "https://company.sharepoint.com/sites/SiteName"
  folder_path: "/Shared Documents/Reports"
  use_app_authentication: true  # Azure AD app auth (recommended)
  client_id: "azure-ad-client-id"
  client_secret: "azure-ad-client-secret"

scheduler:
  enabled: true/false
  schedule_type: "interval" | "cron" | "once"
  interval_hours: 24
  cron_expression: "0 8 * * *"  # 8 AM daily

api:
  enabled: true/false
  port: 8080
  api_key: "optional-api-key"

report:
  output_dir: "output"
  reports_dir: "reports"

API Endpoints

Web UI Server (web_ui.py):

  • GET / - Web UI interface
  • POST /api/generate - Generate report
  • POST /api/update-sharepoint - Download files from SharePoint
  • GET /api/status - Service status
  • GET /api/reports - List generated reports
  • GET /api/config - Configuration (safe, no secrets)
  • GET /reports/<filename> - Serve report files

API Server (api_server.py):

  • POST /api/generate - Generate report (programmatic)
  • GET /api/status - Service status
  • GET /health - Health check

Data Models

PunchlistItem:

  • punchlist_name, description, priority, date_identified, date_completed
  • status, status_updates, issue_image, age_days

VendorMetrics:

  • vendor_name, total_items, closed_count, open_count, monitor_count
  • updates_24h (added, closed, changed_to_monitor)
  • oldest_unaddressed (top 3)
  • very_high_priority_items, high_priority_items

FullReport:

  • report_generated_at, vendors[], summary{}

🔗 Taskboard Integration Possibilities

Option 1: Embedded Widget/Page

Create a new page in Taskboard (/vendor-reports) that:

  • Uses Taskboard's authentication (already authenticated users)
  • Embeds the generated HTML report in an iframe or renders it directly
  • Provides a button to trigger report generation
  • Shows report history/list

Implementation:

// taskboard/src/app/(dashboard)/vendor-reports/page.tsx
// - Call Python API server to generate reports
// - Display generated HTML reports
// - Use Taskboard's UI components for consistency

Option 2: API Integration

Create Taskboard API routes that proxy to the Python API:

  • POST /api/vendor-reports/generate → Calls Python POST /api/generate
  • GET /api/vendor-reports/list → Calls Python GET /api/reports
  • GET /api/vendor-reports/status → Calls Python GET /api/status

Benefits:

  • Single authentication system (Taskboard)
  • Consistent API patterns
  • Can add Taskboard-specific features (notifications, task linking)

Option 3: Background Service

Run the Python scheduler as a background service that:

  • Generates reports on schedule
  • Saves reports to a shared location
  • Taskboard displays the latest report
  • Can trigger notifications when reports are updated

Option 4: Task Integration

Link reports to Taskboard tasks:

  • Create tasks for vendors with unaddressed items
  • Link report generation to project/task completion
  • Use report metrics in task dashboards

🚀 Usage Examples

Command Line

# Generate report from local files
python report_generator.py

# Generate with custom directories
python report_generator.py --reports-dir /path/to/excel --output /path/to/output.json

Web UI

# Start web UI server
python web_ui.py

# Open browser: http://localhost:8080
# Click "Update Data from SharePoint" → "Generate Report"

API

# Generate report via API
curl -X POST http://localhost:8080/api/generate \
  -H "Content-Type: application/json" \
  -d '{"download_from_sharepoint": false}'

# Update from SharePoint
curl -X POST http://localhost:8080/api/update-sharepoint

Scheduled

# Start scheduler (runs continuously)
python scheduler.py

# Configured via config.yaml:
# scheduler:
#   enabled: true
#   schedule_type: "cron"
#   cron_expression: "0 8 * * *"  # 8 AM daily

Programmatic (Python)

from report_generator import generate_report

# Generate report
report_data = generate_report(
    reports_dir="reports",
    output_file="output/report.json",
    verbose=True
)

# Access data
vendors = report_data['vendors']
summary = report_data['summary']

📊 Report Structure

JSON Report Format

{
  "report_generated_at": "2025-11-06T16:00:00",
  "vendors": [
    {
      "vendor_name": "VendorName",
      "total_items": 10,
      "closed_count": 5,
      "open_count": 3,
      "monitor_count": 2,
      "updates_24h": {
        "added": [...],
        "closed": [...],
        "changed_to_monitor": [...]
      },
      "oldest_unaddressed": [...],
      "very_high_priority_items": [...],
      "high_priority_items": [...],
      "closed_items": [...],
      "monitor_items": [...],
      "open_items": [...]
    }
  ],
  "summary": {
    "total_vendors": 5,
    "total_items": 50,
    "total_closed": 25,
    "total_open": 15,
    "total_monitor": 10
  }
}

HTML Report Features

  • Summary Cards: Overview statistics
  • Vendor Tabs: Quick navigation between vendors
  • Status Tabs: Filter by All, Yesterday's Updates, Oldest Unaddressed, Closed, Monitor, Open
  • Search & Filters: Search by name/description, filter by vendor/status/priority
  • Quick Filters: Show only vendors with updates or oldest items
  • Responsive Design: Works on desktop and mobile

🔐 Authentication & Security

Current State

  • Web UI: Optional API key authentication
  • SharePoint: Azure AD app authentication (recommended) or user credentials
  • No User Management: Standalone application

Taskboard Integration Benefits

  • Leverage Existing Auth: Use Taskboard's Authentik/Microsoft Entra ID authentication
  • Role-Based Access: Control who can generate/view reports
  • Audit Trail: Track who generated reports (via Taskboard user system)
  • Secure Storage: Use Taskboard's file storage for reports

📝 Integration Checklist

Phase 1: Basic Integration

  • Set up Python API server as background service
  • Create Taskboard API route that proxies to Python API
  • Create Taskboard page to display reports
  • Add "Generate Report" button in Taskboard UI

Phase 2: Enhanced Integration

  • Use Taskboard authentication for report access
  • Store report metadata in Taskboard database
  • Add report history/versioning
  • Link reports to projects/tasks

Phase 3: Advanced Features

  • Scheduled report generation via Taskboard
  • Notifications when reports are generated
  • Dashboard widgets showing report metrics
  • Export reports to Taskboard tasks/boards

🛠️ Development Notes

Running Locally

# Setup
cd vendor_report
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Configure
cp config.yaml.template config.yaml
# Edit config.yaml with SharePoint credentials

# Run Web UI
python web_ui.py
# Open http://localhost:8080

Deployment Considerations

  • Python Environment: Requires Python 3.8+
  • Dependencies: Install via pip
  • Configuration: Store secrets securely (environment variables or vault)
  • Port: Default 8080 (configurable)
  • File Storage: Reports saved to output/ directory
  • SharePoint: Requires Azure AD app registration

Error Handling

  • Graceful handling of missing Excel files
  • SharePoint connection errors logged
  • Invalid data formats handled
  • User-friendly error messages in Web UI

📚 Additional Resources

  • SharePoint Setup: See SHAREPOINT_SETUP.md for detailed Azure AD configuration
  • Quick Start: See QUICK_START.md for 5-minute setup guide
  • Full Documentation: See README.md for complete usage guide

💡 Integration Ideas for Taskboard

  1. Vendor Dashboard: Show vendor metrics as cards/widgets
  2. Report History: Track when reports were generated, by whom
  3. Task Creation: Auto-create tasks for vendors with oldest unaddressed items
  4. Notifications: Alert project managers when reports are generated
  5. Export to Tasks: Convert report items to Taskboard tasks
  6. Project Linking: Associate reports with Taskboard projects
  7. Scheduled Reports: Use Taskboard's scheduling to trigger reports
  8. Role-Based Views: Different report views for different user roles

🔄 Current Status

  • Core functionality complete
  • SharePoint integration working
  • Web UI functional
  • API endpoints available
  • Scheduled generation supported
  • Taskboard integration pending
  • Authentication integration pending
  • Database storage pending

Last Updated: November 6, 2025
Version: 1.0
Status: Production Ready (Standalone), Integration Ready (Taskboard)