vendor_report/TASKBOARD_INTEGRATION_CONTEXT.md
2025-11-06 20:50:19 +04:00

471 lines
14 KiB
Markdown

# Vendor Report Generator - Taskboard Integration Context
## 🎯 Goal & Purpose
The **Vendor Report Generator** is a Python-based tool designed to automate the generation of comprehensive vendor punchlist reports from Excel files stored in SharePoint. The goal is to:
1. **Automate Report Generation**: Eliminate manual Excel processing and report creation
2. **Centralize Data**: Pull vendor punchlist data directly from SharePoint
3. **Provide Insights**: Generate actionable reports with metrics, priorities, and status tracking
4. **Enable Integration**: Make reports accessible within Taskboard for team collaboration
### Business Value
- **Time Savings**: Automates hours of manual report generation
- **Accuracy**: Consistent data normalization and calculation
- **Visibility**: Real-time vendor status tracking and metrics
- **Accessibility**: Web-based interface for non-technical users
- **Integration Ready**: Can be embedded as a tool/widget in Taskboard
---
## 📋 Application Overview
### What It Does
The application processes Excel files containing vendor punchlist items and generates:
- **Interactive HTML Reports**: Searchable, filterable web reports with vendor tabs, status filters, and priority grouping
- **JSON Data**: Structured data for further processing or API integration
- **Metrics**: Per-vendor statistics (total items, closed/open counts, 24-hour updates, oldest unaddressed items)
### Key Features
1. **Excel Processing**: Direct pandas-based reading (no manual conversion needed)
2. **Data Normalization**: Automatically handles vendor name variations, status inconsistencies, priority classifications
3. **24-Hour Tracking**: Identifies items added, closed, or changed to monitor status in the last 24 hours (Baltimore/Eastern timezone)
4. **Priority Classification**: Groups items by Very High, High, Medium, Low priorities
5. **Oldest Items**: Highlights the oldest 3 unaddressed items per vendor
6. **SharePoint Integration**: Automatically downloads Excel files from SharePoint
7. **Scheduled Generation**: Can run automatically on a schedule
8. **Web UI**: User-friendly interface for generating reports
9. **REST API**: Programmatic access for integration
---
## 🏗️ Architecture & Components
### Core Components
```
vendor_report/
├── report_generator.py # Main entry point - orchestrates report generation
├── data_preprocessor.py # Excel parsing, normalization, data cleaning
├── html_generator.py # Generates interactive HTML reports
├── models.py # Pydantic data models for validation
├── sharepoint_downloader.py # SharePoint file downloader
├── scheduler.py # Scheduled report generation
├── api_server.py # REST API server
├── web_ui.py # Web UI server (Flask-based)
├── config.py # Configuration management
└── config.yaml # Configuration file
```
### Data Flow
```
SharePoint Excel Files
[SharePoint Downloader] → Local reports/ directory
[Data Preprocessor] → Normalize vendors, statuses, priorities, parse dates
[Report Generator] → Calculate metrics, group by vendor, identify updates
[HTML Generator] → Generate interactive report.html
[Output] → output/report.json + output/report.html
```
### Processing Pipeline
1. **Input**: Excel files with columns:
- Punchlist Name, Vendor, Priority, Description, Date Identified, Status Updates, Issue Image, Status, Date Completed
2. **Preprocessing**:
- Parse Excel files using pandas
- Normalize vendor names (handle case variations, combined vendors)
- Normalize statuses (Complete, Monitor, Incomplete)
- Classify priorities (Very High, High, Medium, Low)
- Parse dates (multiple formats supported)
- Calculate 24-hour windows (Baltimore/Eastern timezone)
- Calculate item age (days since identified)
3. **Report Generation**:
- Group items by vendor
- Calculate metrics per vendor (total, closed, open, monitor counts)
- Identify 24-hour updates (added, closed, changed to monitor)
- Find oldest 3 unaddressed items per vendor
- Group by priority levels
- Generate JSON structure
- Generate HTML report
4. **Output**:
- `output/report.json`: Structured JSON data
- `output/report.html`: Interactive HTML report
- `output/preprocessed_data.txt`: Debug/preview data
---
## 🔧 Technical Details
### Dependencies
```python
# Core
pandas>=2.0.0 # Excel processing
openpyxl>=3.0.0 # Excel file reading
pydantic>=2.0.0 # Data validation
# Optional: SharePoint
Office365-REST-Python-Client>=2.3.0 # SharePoint API
# Optional: Scheduling
apscheduler>=3.10.0 # Task scheduling
# Optional: Web UI/API
flask>=2.3.0 # Web framework
flask-cors>=4.0.0 # CORS support
# Configuration
pyyaml>=6.0 # YAML config parsing
```
### Configuration
Configuration is managed via `config.yaml`:
```yaml
sharepoint:
enabled: true/false
site_url: "https://company.sharepoint.com/sites/SiteName"
folder_path: "/Shared Documents/Reports"
use_app_authentication: true # Azure AD app auth (recommended)
client_id: "azure-ad-client-id"
client_secret: "azure-ad-client-secret"
scheduler:
enabled: true/false
schedule_type: "interval" | "cron" | "once"
interval_hours: 24
cron_expression: "0 8 * * *" # 8 AM daily
api:
enabled: true/false
port: 8080
api_key: "optional-api-key"
report:
output_dir: "output"
reports_dir: "reports"
```
### API Endpoints
**Web UI Server** (`web_ui.py`):
- `GET /` - Web UI interface
- `POST /api/generate` - Generate report
- `POST /api/update-sharepoint` - Download files from SharePoint
- `GET /api/status` - Service status
- `GET /api/reports` - List generated reports
- `GET /api/config` - Configuration (safe, no secrets)
- `GET /reports/<filename>` - Serve report files
**API Server** (`api_server.py`):
- `POST /api/generate` - Generate report (programmatic)
- `GET /api/status` - Service status
- `GET /health` - Health check
### Data Models
**PunchlistItem**:
- punchlist_name, description, priority, date_identified, date_completed
- status, status_updates, issue_image, age_days
**VendorMetrics**:
- vendor_name, total_items, closed_count, open_count, monitor_count
- updates_24h (added, closed, changed_to_monitor)
- oldest_unaddressed (top 3)
- very_high_priority_items, high_priority_items
**FullReport**:
- report_generated_at, vendors[], summary{}
---
## 🔗 Taskboard Integration Possibilities
### Option 1: Embedded Widget/Page
Create a new page in Taskboard (`/vendor-reports`) that:
- Uses Taskboard's authentication (already authenticated users)
- Embeds the generated HTML report in an iframe or renders it directly
- Provides a button to trigger report generation
- Shows report history/list
**Implementation**:
```typescript
// taskboard/src/app/(dashboard)/vendor-reports/page.tsx
// - Call Python API server to generate reports
// - Display generated HTML reports
// - Use Taskboard's UI components for consistency
```
### Option 2: API Integration
Create Taskboard API routes that proxy to the Python API:
- `POST /api/vendor-reports/generate` → Calls Python `POST /api/generate`
- `GET /api/vendor-reports/list` → Calls Python `GET /api/reports`
- `GET /api/vendor-reports/status` → Calls Python `GET /api/status`
**Benefits**:
- Single authentication system (Taskboard)
- Consistent API patterns
- Can add Taskboard-specific features (notifications, task linking)
### Option 3: Background Service
Run the Python scheduler as a background service that:
- Generates reports on schedule
- Saves reports to a shared location
- Taskboard displays the latest report
- Can trigger notifications when reports are updated
### Option 4: Task Integration
Link reports to Taskboard tasks:
- Create tasks for vendors with unaddressed items
- Link report generation to project/task completion
- Use report metrics in task dashboards
---
## 🚀 Usage Examples
### Command Line
```bash
# Generate report from local files
python report_generator.py
# Generate with custom directories
python report_generator.py --reports-dir /path/to/excel --output /path/to/output.json
```
### Web UI
```bash
# Start web UI server
python web_ui.py
# Open browser: http://localhost:8080
# Click "Update Data from SharePoint" → "Generate Report"
```
### API
```bash
# Generate report via API
curl -X POST http://localhost:8080/api/generate \
-H "Content-Type: application/json" \
-d '{"download_from_sharepoint": false}'
# Update from SharePoint
curl -X POST http://localhost:8080/api/update-sharepoint
```
### Scheduled
```bash
# Start scheduler (runs continuously)
python scheduler.py
# Configured via config.yaml:
# scheduler:
# enabled: true
# schedule_type: "cron"
# cron_expression: "0 8 * * *" # 8 AM daily
```
### Programmatic (Python)
```python
from report_generator import generate_report
# Generate report
report_data = generate_report(
reports_dir="reports",
output_file="output/report.json",
verbose=True
)
# Access data
vendors = report_data['vendors']
summary = report_data['summary']
```
---
## 📊 Report Structure
### JSON Report Format
```json
{
"report_generated_at": "2025-11-06T16:00:00",
"vendors": [
{
"vendor_name": "VendorName",
"total_items": 10,
"closed_count": 5,
"open_count": 3,
"monitor_count": 2,
"updates_24h": {
"added": [...],
"closed": [...],
"changed_to_monitor": [...]
},
"oldest_unaddressed": [...],
"very_high_priority_items": [...],
"high_priority_items": [...],
"closed_items": [...],
"monitor_items": [...],
"open_items": [...]
}
],
"summary": {
"total_vendors": 5,
"total_items": 50,
"total_closed": 25,
"total_open": 15,
"total_monitor": 10
}
}
```
### HTML Report Features
- **Summary Cards**: Overview statistics
- **Vendor Tabs**: Quick navigation between vendors
- **Status Tabs**: Filter by All, Yesterday's Updates, Oldest Unaddressed, Closed, Monitor, Open
- **Search & Filters**: Search by name/description, filter by vendor/status/priority
- **Quick Filters**: Show only vendors with updates or oldest items
- **Responsive Design**: Works on desktop and mobile
---
## 🔐 Authentication & Security
### Current State
- **Web UI**: Optional API key authentication
- **SharePoint**: Azure AD app authentication (recommended) or user credentials
- **No User Management**: Standalone application
### Taskboard Integration Benefits
- **Leverage Existing Auth**: Use Taskboard's Authentik/Microsoft Entra ID authentication
- **Role-Based Access**: Control who can generate/view reports
- **Audit Trail**: Track who generated reports (via Taskboard user system)
- **Secure Storage**: Use Taskboard's file storage for reports
---
## 📝 Integration Checklist
### Phase 1: Basic Integration
- [ ] Set up Python API server as background service
- [ ] Create Taskboard API route that proxies to Python API
- [ ] Create Taskboard page to display reports
- [ ] Add "Generate Report" button in Taskboard UI
### Phase 2: Enhanced Integration
- [ ] Use Taskboard authentication for report access
- [ ] Store report metadata in Taskboard database
- [ ] Add report history/versioning
- [ ] Link reports to projects/tasks
### Phase 3: Advanced Features
- [ ] Scheduled report generation via Taskboard
- [ ] Notifications when reports are generated
- [ ] Dashboard widgets showing report metrics
- [ ] Export reports to Taskboard tasks/boards
---
## 🛠️ Development Notes
### Running Locally
```bash
# Setup
cd vendor_report
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Configure
cp config.yaml.template config.yaml
# Edit config.yaml with SharePoint credentials
# Run Web UI
python web_ui.py
# Open http://localhost:8080
```
### Deployment Considerations
- **Python Environment**: Requires Python 3.8+
- **Dependencies**: Install via pip
- **Configuration**: Store secrets securely (environment variables or vault)
- **Port**: Default 8080 (configurable)
- **File Storage**: Reports saved to `output/` directory
- **SharePoint**: Requires Azure AD app registration
### Error Handling
- Graceful handling of missing Excel files
- SharePoint connection errors logged
- Invalid data formats handled
- User-friendly error messages in Web UI
---
## 📚 Additional Resources
- **SharePoint Setup**: See `SHAREPOINT_SETUP.md` for detailed Azure AD configuration
- **Quick Start**: See `QUICK_START.md` for 5-minute setup guide
- **Full Documentation**: See `README.md` for complete usage guide
---
## 💡 Integration Ideas for Taskboard
1. **Vendor Dashboard**: Show vendor metrics as cards/widgets
2. **Report History**: Track when reports were generated, by whom
3. **Task Creation**: Auto-create tasks for vendors with oldest unaddressed items
4. **Notifications**: Alert project managers when reports are generated
5. **Export to Tasks**: Convert report items to Taskboard tasks
6. **Project Linking**: Associate reports with Taskboard projects
7. **Scheduled Reports**: Use Taskboard's scheduling to trigger reports
8. **Role-Based Views**: Different report views for different user roles
---
## 🔄 Current Status
- ✅ Core functionality complete
- ✅ SharePoint integration working
- ✅ Web UI functional
- ✅ API endpoints available
- ✅ Scheduled generation supported
- ⏳ Taskboard integration pending
- ⏳ Authentication integration pending
- ⏳ Database storage pending
---
**Last Updated**: November 6, 2025
**Version**: 1.0
**Status**: Production Ready (Standalone), Integration Ready (Taskboard)