14 KiB
Vendor Report Generator - Taskboard Integration Context
🎯 Goal & Purpose
The Vendor Report Generator is a Python-based tool designed to automate the generation of comprehensive vendor punchlist reports from Excel files stored in SharePoint. The goal is to:
- Automate Report Generation: Eliminate manual Excel processing and report creation
- Centralize Data: Pull vendor punchlist data directly from SharePoint
- Provide Insights: Generate actionable reports with metrics, priorities, and status tracking
- Enable Integration: Make reports accessible within Taskboard for team collaboration
Business Value
- Time Savings: Automates hours of manual report generation
- Accuracy: Consistent data normalization and calculation
- Visibility: Real-time vendor status tracking and metrics
- Accessibility: Web-based interface for non-technical users
- Integration Ready: Can be embedded as a tool/widget in Taskboard
📋 Application Overview
What It Does
The application processes Excel files containing vendor punchlist items and generates:
- Interactive HTML Reports: Searchable, filterable web reports with vendor tabs, status filters, and priority grouping
- JSON Data: Structured data for further processing or API integration
- Metrics: Per-vendor statistics (total items, closed/open counts, 24-hour updates, oldest unaddressed items)
Key Features
- Excel Processing: Direct pandas-based reading (no manual conversion needed)
- Data Normalization: Automatically handles vendor name variations, status inconsistencies, priority classifications
- 24-Hour Tracking: Identifies items added, closed, or changed to monitor status in the last 24 hours (Baltimore/Eastern timezone)
- Priority Classification: Groups items by Very High, High, Medium, Low priorities
- Oldest Items: Highlights the oldest 3 unaddressed items per vendor
- SharePoint Integration: Automatically downloads Excel files from SharePoint
- Scheduled Generation: Can run automatically on a schedule
- Web UI: User-friendly interface for generating reports
- REST API: Programmatic access for integration
🏗️ Architecture & Components
Core Components
vendor_report/
├── report_generator.py # Main entry point - orchestrates report generation
├── data_preprocessor.py # Excel parsing, normalization, data cleaning
├── html_generator.py # Generates interactive HTML reports
├── models.py # Pydantic data models for validation
├── sharepoint_downloader.py # SharePoint file downloader
├── scheduler.py # Scheduled report generation
├── api_server.py # REST API server
├── web_ui.py # Web UI server (Flask-based)
├── config.py # Configuration management
└── config.yaml # Configuration file
Data Flow
SharePoint Excel Files
↓
[SharePoint Downloader] → Local reports/ directory
↓
[Data Preprocessor] → Normalize vendors, statuses, priorities, parse dates
↓
[Report Generator] → Calculate metrics, group by vendor, identify updates
↓
[HTML Generator] → Generate interactive report.html
↓
[Output] → output/report.json + output/report.html
Processing Pipeline
-
Input: Excel files with columns:
- Punchlist Name, Vendor, Priority, Description, Date Identified, Status Updates, Issue Image, Status, Date Completed
-
Preprocessing:
- Parse Excel files using pandas
- Normalize vendor names (handle case variations, combined vendors)
- Normalize statuses (Complete, Monitor, Incomplete)
- Classify priorities (Very High, High, Medium, Low)
- Parse dates (multiple formats supported)
- Calculate 24-hour windows (Baltimore/Eastern timezone)
- Calculate item age (days since identified)
-
Report Generation:
- Group items by vendor
- Calculate metrics per vendor (total, closed, open, monitor counts)
- Identify 24-hour updates (added, closed, changed to monitor)
- Find oldest 3 unaddressed items per vendor
- Group by priority levels
- Generate JSON structure
- Generate HTML report
-
Output:
output/report.json: Structured JSON dataoutput/report.html: Interactive HTML reportoutput/preprocessed_data.txt: Debug/preview data
🔧 Technical Details
Dependencies
# Core
pandas>=2.0.0 # Excel processing
openpyxl>=3.0.0 # Excel file reading
pydantic>=2.0.0 # Data validation
# Optional: SharePoint
Office365-REST-Python-Client>=2.3.0 # SharePoint API
# Optional: Scheduling
apscheduler>=3.10.0 # Task scheduling
# Optional: Web UI/API
flask>=2.3.0 # Web framework
flask-cors>=4.0.0 # CORS support
# Configuration
pyyaml>=6.0 # YAML config parsing
Configuration
Configuration is managed via config.yaml:
sharepoint:
enabled: true/false
site_url: "https://company.sharepoint.com/sites/SiteName"
folder_path: "/Shared Documents/Reports"
use_app_authentication: true # Azure AD app auth (recommended)
client_id: "azure-ad-client-id"
client_secret: "azure-ad-client-secret"
scheduler:
enabled: true/false
schedule_type: "interval" | "cron" | "once"
interval_hours: 24
cron_expression: "0 8 * * *" # 8 AM daily
api:
enabled: true/false
port: 8080
api_key: "optional-api-key"
report:
output_dir: "output"
reports_dir: "reports"
API Endpoints
Web UI Server (web_ui.py):
GET /- Web UI interfacePOST /api/generate- Generate reportPOST /api/update-sharepoint- Download files from SharePointGET /api/status- Service statusGET /api/reports- List generated reportsGET /api/config- Configuration (safe, no secrets)GET /reports/<filename>- Serve report files
API Server (api_server.py):
POST /api/generate- Generate report (programmatic)GET /api/status- Service statusGET /health- Health check
Data Models
PunchlistItem:
- punchlist_name, description, priority, date_identified, date_completed
- status, status_updates, issue_image, age_days
VendorMetrics:
- vendor_name, total_items, closed_count, open_count, monitor_count
- updates_24h (added, closed, changed_to_monitor)
- oldest_unaddressed (top 3)
- very_high_priority_items, high_priority_items
FullReport:
- report_generated_at, vendors[], summary{}
🔗 Taskboard Integration Possibilities
Option 1: Embedded Widget/Page
Create a new page in Taskboard (/vendor-reports) that:
- Uses Taskboard's authentication (already authenticated users)
- Embeds the generated HTML report in an iframe or renders it directly
- Provides a button to trigger report generation
- Shows report history/list
Implementation:
// taskboard/src/app/(dashboard)/vendor-reports/page.tsx
// - Call Python API server to generate reports
// - Display generated HTML reports
// - Use Taskboard's UI components for consistency
Option 2: API Integration
Create Taskboard API routes that proxy to the Python API:
POST /api/vendor-reports/generate→ Calls PythonPOST /api/generateGET /api/vendor-reports/list→ Calls PythonGET /api/reportsGET /api/vendor-reports/status→ Calls PythonGET /api/status
Benefits:
- Single authentication system (Taskboard)
- Consistent API patterns
- Can add Taskboard-specific features (notifications, task linking)
Option 3: Background Service
Run the Python scheduler as a background service that:
- Generates reports on schedule
- Saves reports to a shared location
- Taskboard displays the latest report
- Can trigger notifications when reports are updated
Option 4: Task Integration
Link reports to Taskboard tasks:
- Create tasks for vendors with unaddressed items
- Link report generation to project/task completion
- Use report metrics in task dashboards
🚀 Usage Examples
Command Line
# Generate report from local files
python report_generator.py
# Generate with custom directories
python report_generator.py --reports-dir /path/to/excel --output /path/to/output.json
Web UI
# Start web UI server
python web_ui.py
# Open browser: http://localhost:8080
# Click "Update Data from SharePoint" → "Generate Report"
API
# Generate report via API
curl -X POST http://localhost:8080/api/generate \
-H "Content-Type: application/json" \
-d '{"download_from_sharepoint": false}'
# Update from SharePoint
curl -X POST http://localhost:8080/api/update-sharepoint
Scheduled
# Start scheduler (runs continuously)
python scheduler.py
# Configured via config.yaml:
# scheduler:
# enabled: true
# schedule_type: "cron"
# cron_expression: "0 8 * * *" # 8 AM daily
Programmatic (Python)
from report_generator import generate_report
# Generate report
report_data = generate_report(
reports_dir="reports",
output_file="output/report.json",
verbose=True
)
# Access data
vendors = report_data['vendors']
summary = report_data['summary']
📊 Report Structure
JSON Report Format
{
"report_generated_at": "2025-11-06T16:00:00",
"vendors": [
{
"vendor_name": "VendorName",
"total_items": 10,
"closed_count": 5,
"open_count": 3,
"monitor_count": 2,
"updates_24h": {
"added": [...],
"closed": [...],
"changed_to_monitor": [...]
},
"oldest_unaddressed": [...],
"very_high_priority_items": [...],
"high_priority_items": [...],
"closed_items": [...],
"monitor_items": [...],
"open_items": [...]
}
],
"summary": {
"total_vendors": 5,
"total_items": 50,
"total_closed": 25,
"total_open": 15,
"total_monitor": 10
}
}
HTML Report Features
- Summary Cards: Overview statistics
- Vendor Tabs: Quick navigation between vendors
- Status Tabs: Filter by All, Yesterday's Updates, Oldest Unaddressed, Closed, Monitor, Open
- Search & Filters: Search by name/description, filter by vendor/status/priority
- Quick Filters: Show only vendors with updates or oldest items
- Responsive Design: Works on desktop and mobile
🔐 Authentication & Security
Current State
- Web UI: Optional API key authentication
- SharePoint: Azure AD app authentication (recommended) or user credentials
- No User Management: Standalone application
Taskboard Integration Benefits
- Leverage Existing Auth: Use Taskboard's Authentik/Microsoft Entra ID authentication
- Role-Based Access: Control who can generate/view reports
- Audit Trail: Track who generated reports (via Taskboard user system)
- Secure Storage: Use Taskboard's file storage for reports
📝 Integration Checklist
Phase 1: Basic Integration
- Set up Python API server as background service
- Create Taskboard API route that proxies to Python API
- Create Taskboard page to display reports
- Add "Generate Report" button in Taskboard UI
Phase 2: Enhanced Integration
- Use Taskboard authentication for report access
- Store report metadata in Taskboard database
- Add report history/versioning
- Link reports to projects/tasks
Phase 3: Advanced Features
- Scheduled report generation via Taskboard
- Notifications when reports are generated
- Dashboard widgets showing report metrics
- Export reports to Taskboard tasks/boards
🛠️ Development Notes
Running Locally
# Setup
cd vendor_report
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Configure
cp config.yaml.template config.yaml
# Edit config.yaml with SharePoint credentials
# Run Web UI
python web_ui.py
# Open http://localhost:8080
Deployment Considerations
- Python Environment: Requires Python 3.8+
- Dependencies: Install via pip
- Configuration: Store secrets securely (environment variables or vault)
- Port: Default 8080 (configurable)
- File Storage: Reports saved to
output/directory - SharePoint: Requires Azure AD app registration
Error Handling
- Graceful handling of missing Excel files
- SharePoint connection errors logged
- Invalid data formats handled
- User-friendly error messages in Web UI
📚 Additional Resources
- SharePoint Setup: See
SHAREPOINT_SETUP.mdfor detailed Azure AD configuration - Quick Start: See
QUICK_START.mdfor 5-minute setup guide - Full Documentation: See
README.mdfor complete usage guide
💡 Integration Ideas for Taskboard
- Vendor Dashboard: Show vendor metrics as cards/widgets
- Report History: Track when reports were generated, by whom
- Task Creation: Auto-create tasks for vendors with oldest unaddressed items
- Notifications: Alert project managers when reports are generated
- Export to Tasks: Convert report items to Taskboard tasks
- Project Linking: Associate reports with Taskboard projects
- Scheduled Reports: Use Taskboard's scheduling to trigger reports
- Role-Based Views: Different report views for different user roles
🔄 Current Status
- ✅ Core functionality complete
- ✅ SharePoint integration working
- ✅ Web UI functional
- ✅ API endpoints available
- ✅ Scheduled generation supported
- ⏳ Taskboard integration pending
- ⏳ Authentication integration pending
- ⏳ Database storage pending
Last Updated: November 6, 2025
Version: 1.0
Status: Production Ready (Standalone), Integration Ready (Taskboard)