# Vendor Report Generator - Taskboard Integration Context ## 🎯 Goal & Purpose The **Vendor Report Generator** is a Python-based tool designed to automate the generation of comprehensive vendor punchlist reports from Excel files stored in SharePoint. The goal is to: 1. **Automate Report Generation**: Eliminate manual Excel processing and report creation 2. **Centralize Data**: Pull vendor punchlist data directly from SharePoint 3. **Provide Insights**: Generate actionable reports with metrics, priorities, and status tracking 4. **Enable Integration**: Make reports accessible within Taskboard for team collaboration ### Business Value - **Time Savings**: Automates hours of manual report generation - **Accuracy**: Consistent data normalization and calculation - **Visibility**: Real-time vendor status tracking and metrics - **Accessibility**: Web-based interface for non-technical users - **Integration Ready**: Can be embedded as a tool/widget in Taskboard --- ## 📋 Application Overview ### What It Does The application processes Excel files containing vendor punchlist items and generates: - **Interactive HTML Reports**: Searchable, filterable web reports with vendor tabs, status filters, and priority grouping - **JSON Data**: Structured data for further processing or API integration - **Metrics**: Per-vendor statistics (total items, closed/open counts, 24-hour updates, oldest unaddressed items) ### Key Features 1. **Excel Processing**: Direct pandas-based reading (no manual conversion needed) 2. **Data Normalization**: Automatically handles vendor name variations, status inconsistencies, priority classifications 3. **24-Hour Tracking**: Identifies items added, closed, or changed to monitor status in the last 24 hours (Baltimore/Eastern timezone) 4. **Priority Classification**: Groups items by Very High, High, Medium, Low priorities 5. **Oldest Items**: Highlights the oldest 3 unaddressed items per vendor 6. **SharePoint Integration**: Automatically downloads Excel files from SharePoint 7. **Scheduled Generation**: Can run automatically on a schedule 8. **Web UI**: User-friendly interface for generating reports 9. **REST API**: Programmatic access for integration --- ## 🏗️ Architecture & Components ### Core Components ``` vendor_report/ ├── report_generator.py # Main entry point - orchestrates report generation ├── data_preprocessor.py # Excel parsing, normalization, data cleaning ├── html_generator.py # Generates interactive HTML reports ├── models.py # Pydantic data models for validation ├── sharepoint_downloader.py # SharePoint file downloader ├── scheduler.py # Scheduled report generation ├── api_server.py # REST API server ├── web_ui.py # Web UI server (Flask-based) ├── config.py # Configuration management └── config.yaml # Configuration file ``` ### Data Flow ``` SharePoint Excel Files ↓ [SharePoint Downloader] → Local reports/ directory ↓ [Data Preprocessor] → Normalize vendors, statuses, priorities, parse dates ↓ [Report Generator] → Calculate metrics, group by vendor, identify updates ↓ [HTML Generator] → Generate interactive report.html ↓ [Output] → output/report.json + output/report.html ``` ### Processing Pipeline 1. **Input**: Excel files with columns: - Punchlist Name, Vendor, Priority, Description, Date Identified, Status Updates, Issue Image, Status, Date Completed 2. **Preprocessing**: - Parse Excel files using pandas - Normalize vendor names (handle case variations, combined vendors) - Normalize statuses (Complete, Monitor, Incomplete) - Classify priorities (Very High, High, Medium, Low) - Parse dates (multiple formats supported) - Calculate 24-hour windows (Baltimore/Eastern timezone) - Calculate item age (days since identified) 3. **Report Generation**: - Group items by vendor - Calculate metrics per vendor (total, closed, open, monitor counts) - Identify 24-hour updates (added, closed, changed to monitor) - Find oldest 3 unaddressed items per vendor - Group by priority levels - Generate JSON structure - Generate HTML report 4. **Output**: - `output/report.json`: Structured JSON data - `output/report.html`: Interactive HTML report - `output/preprocessed_data.txt`: Debug/preview data --- ## 🔧 Technical Details ### Dependencies ```python # Core pandas>=2.0.0 # Excel processing openpyxl>=3.0.0 # Excel file reading pydantic>=2.0.0 # Data validation # Optional: SharePoint Office365-REST-Python-Client>=2.3.0 # SharePoint API # Optional: Scheduling apscheduler>=3.10.0 # Task scheduling # Optional: Web UI/API flask>=2.3.0 # Web framework flask-cors>=4.0.0 # CORS support # Configuration pyyaml>=6.0 # YAML config parsing ``` ### Configuration Configuration is managed via `config.yaml`: ```yaml sharepoint: enabled: true/false site_url: "https://company.sharepoint.com/sites/SiteName" folder_path: "/Shared Documents/Reports" use_app_authentication: true # Azure AD app auth (recommended) client_id: "azure-ad-client-id" client_secret: "azure-ad-client-secret" scheduler: enabled: true/false schedule_type: "interval" | "cron" | "once" interval_hours: 24 cron_expression: "0 8 * * *" # 8 AM daily api: enabled: true/false port: 8080 api_key: "optional-api-key" report: output_dir: "output" reports_dir: "reports" ``` ### API Endpoints **Web UI Server** (`web_ui.py`): - `GET /` - Web UI interface - `POST /api/generate` - Generate report - `POST /api/update-sharepoint` - Download files from SharePoint - `GET /api/status` - Service status - `GET /api/reports` - List generated reports - `GET /api/config` - Configuration (safe, no secrets) - `GET /reports/` - Serve report files **API Server** (`api_server.py`): - `POST /api/generate` - Generate report (programmatic) - `GET /api/status` - Service status - `GET /health` - Health check ### Data Models **PunchlistItem**: - punchlist_name, description, priority, date_identified, date_completed - status, status_updates, issue_image, age_days **VendorMetrics**: - vendor_name, total_items, closed_count, open_count, monitor_count - updates_24h (added, closed, changed_to_monitor) - oldest_unaddressed (top 3) - very_high_priority_items, high_priority_items **FullReport**: - report_generated_at, vendors[], summary{} --- ## 🔗 Taskboard Integration Possibilities ### Option 1: Embedded Widget/Page Create a new page in Taskboard (`/vendor-reports`) that: - Uses Taskboard's authentication (already authenticated users) - Embeds the generated HTML report in an iframe or renders it directly - Provides a button to trigger report generation - Shows report history/list **Implementation**: ```typescript // taskboard/src/app/(dashboard)/vendor-reports/page.tsx // - Call Python API server to generate reports // - Display generated HTML reports // - Use Taskboard's UI components for consistency ``` ### Option 2: API Integration Create Taskboard API routes that proxy to the Python API: - `POST /api/vendor-reports/generate` → Calls Python `POST /api/generate` - `GET /api/vendor-reports/list` → Calls Python `GET /api/reports` - `GET /api/vendor-reports/status` → Calls Python `GET /api/status` **Benefits**: - Single authentication system (Taskboard) - Consistent API patterns - Can add Taskboard-specific features (notifications, task linking) ### Option 3: Background Service Run the Python scheduler as a background service that: - Generates reports on schedule - Saves reports to a shared location - Taskboard displays the latest report - Can trigger notifications when reports are updated ### Option 4: Task Integration Link reports to Taskboard tasks: - Create tasks for vendors with unaddressed items - Link report generation to project/task completion - Use report metrics in task dashboards --- ## 🚀 Usage Examples ### Command Line ```bash # Generate report from local files python report_generator.py # Generate with custom directories python report_generator.py --reports-dir /path/to/excel --output /path/to/output.json ``` ### Web UI ```bash # Start web UI server python web_ui.py # Open browser: http://localhost:8080 # Click "Update Data from SharePoint" → "Generate Report" ``` ### API ```bash # Generate report via API curl -X POST http://localhost:8080/api/generate \ -H "Content-Type: application/json" \ -d '{"download_from_sharepoint": false}' # Update from SharePoint curl -X POST http://localhost:8080/api/update-sharepoint ``` ### Scheduled ```bash # Start scheduler (runs continuously) python scheduler.py # Configured via config.yaml: # scheduler: # enabled: true # schedule_type: "cron" # cron_expression: "0 8 * * *" # 8 AM daily ``` ### Programmatic (Python) ```python from report_generator import generate_report # Generate report report_data = generate_report( reports_dir="reports", output_file="output/report.json", verbose=True ) # Access data vendors = report_data['vendors'] summary = report_data['summary'] ``` --- ## 📊 Report Structure ### JSON Report Format ```json { "report_generated_at": "2025-11-06T16:00:00", "vendors": [ { "vendor_name": "VendorName", "total_items": 10, "closed_count": 5, "open_count": 3, "monitor_count": 2, "updates_24h": { "added": [...], "closed": [...], "changed_to_monitor": [...] }, "oldest_unaddressed": [...], "very_high_priority_items": [...], "high_priority_items": [...], "closed_items": [...], "monitor_items": [...], "open_items": [...] } ], "summary": { "total_vendors": 5, "total_items": 50, "total_closed": 25, "total_open": 15, "total_monitor": 10 } } ``` ### HTML Report Features - **Summary Cards**: Overview statistics - **Vendor Tabs**: Quick navigation between vendors - **Status Tabs**: Filter by All, Yesterday's Updates, Oldest Unaddressed, Closed, Monitor, Open - **Search & Filters**: Search by name/description, filter by vendor/status/priority - **Quick Filters**: Show only vendors with updates or oldest items - **Responsive Design**: Works on desktop and mobile --- ## 🔐 Authentication & Security ### Current State - **Web UI**: Optional API key authentication - **SharePoint**: Azure AD app authentication (recommended) or user credentials - **No User Management**: Standalone application ### Taskboard Integration Benefits - **Leverage Existing Auth**: Use Taskboard's Authentik/Microsoft Entra ID authentication - **Role-Based Access**: Control who can generate/view reports - **Audit Trail**: Track who generated reports (via Taskboard user system) - **Secure Storage**: Use Taskboard's file storage for reports --- ## 📝 Integration Checklist ### Phase 1: Basic Integration - [ ] Set up Python API server as background service - [ ] Create Taskboard API route that proxies to Python API - [ ] Create Taskboard page to display reports - [ ] Add "Generate Report" button in Taskboard UI ### Phase 2: Enhanced Integration - [ ] Use Taskboard authentication for report access - [ ] Store report metadata in Taskboard database - [ ] Add report history/versioning - [ ] Link reports to projects/tasks ### Phase 3: Advanced Features - [ ] Scheduled report generation via Taskboard - [ ] Notifications when reports are generated - [ ] Dashboard widgets showing report metrics - [ ] Export reports to Taskboard tasks/boards --- ## 🛠️ Development Notes ### Running Locally ```bash # Setup cd vendor_report python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate pip install -r requirements.txt # Configure cp config.yaml.template config.yaml # Edit config.yaml with SharePoint credentials # Run Web UI python web_ui.py # Open http://localhost:8080 ``` ### Deployment Considerations - **Python Environment**: Requires Python 3.8+ - **Dependencies**: Install via pip - **Configuration**: Store secrets securely (environment variables or vault) - **Port**: Default 8080 (configurable) - **File Storage**: Reports saved to `output/` directory - **SharePoint**: Requires Azure AD app registration ### Error Handling - Graceful handling of missing Excel files - SharePoint connection errors logged - Invalid data formats handled - User-friendly error messages in Web UI --- ## 📚 Additional Resources - **SharePoint Setup**: See `SHAREPOINT_SETUP.md` for detailed Azure AD configuration - **Quick Start**: See `QUICK_START.md` for 5-minute setup guide - **Full Documentation**: See `README.md` for complete usage guide --- ## 💡 Integration Ideas for Taskboard 1. **Vendor Dashboard**: Show vendor metrics as cards/widgets 2. **Report History**: Track when reports were generated, by whom 3. **Task Creation**: Auto-create tasks for vendors with oldest unaddressed items 4. **Notifications**: Alert project managers when reports are generated 5. **Export to Tasks**: Convert report items to Taskboard tasks 6. **Project Linking**: Associate reports with Taskboard projects 7. **Scheduled Reports**: Use Taskboard's scheduling to trigger reports 8. **Role-Based Views**: Different report views for different user roles --- ## 🔄 Current Status - ✅ Core functionality complete - ✅ SharePoint integration working - ✅ Web UI functional - ✅ API endpoints available - ✅ Scheduled generation supported - ⏳ Taskboard integration pending - ⏳ Authentication integration pending - ⏳ Database storage pending --- **Last Updated**: November 6, 2025 **Version**: 1.0 **Status**: Production Ready (Standalone), Integration Ready (Taskboard)