# Vendor Report Generator A Python tool that generates comprehensive vendor punchlist reports from Excel files. The tool processes Excel data, normalizes vendor information, calculates metrics, and generates both JSON and interactive HTML reports. > **📘 For Taskboard Integration**: See [TASKBOARD_INTEGRATION_CONTEXT.md](./TASKBOARD_INTEGRATION_CONTEXT.md) for detailed context and integration possibilities. ## Features - **Direct Excel Processing**: Reads Excel files directly using pandas - **Data Normalization**: Automatically normalizes vendor names, statuses, and priorities - **24-Hour Updates**: Tracks items added, closed, or changed to monitor status in the last 24 hours (based on Baltimore/Eastern timezone) - **Priority Tracking**: Groups items by priority levels (Very High, High, Medium, Low) - **Oldest Unaddressed Items**: Identifies and highlights the oldest 3 unaddressed items per vendor - **Interactive HTML Reports**: Generates searchable, filterable HTML reports with tabs and filters - **JSON Export**: Exports structured JSON data for further processing - **SharePoint Integration**: Automatically download Excel files from SharePoint - **Scheduled Generation**: Automatically generate reports on a schedule (interval or cron) - **Web API**: REST API for on-demand report generation ## Requirements - Python 3.8 or higher - Dependencies listed in `requirements.txt` ## Installation 1. **Clone the repository**: ```bash git clone https://gitea.lci.ge/ilia.gurielidze/vendor_report.git cd vendor_report ``` 2. **Create a virtual environment** (recommended): ```bash python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate ``` 3. **Install dependencies**: ```bash pip install -r requirements.txt ``` ## Setup 1. **Prepare your Excel files**: - Place your Excel files (`.xlsx` or `.xls`) in the `reports/` directory - Ensure your Excel files have the following columns (in order): - Column 0: Punchlist Name - Column 1: Vendor - Column 2: Priority - Column 3: Description - Column 4: Date Identified - Column 5: Status Updates - Column 6: Issue Image - Column 7: Status - Column 8: Date Completed (optional) 2. **Create necessary directories** (if they don't exist): ```bash mkdir -p reports output ``` ## Usage ### Basic Usage Generate a report from Excel files in the `reports/` directory: ```bash python3 report_generator.py ``` This will: - Process all Excel files in the `reports/` directory - Generate a JSON report at `output/report.json` - Generate an HTML report at `output/report.html` - Save preprocessed data to `output/preprocessed_data.txt` ### Command-Line Options ```bash python3 report_generator.py [OPTIONS] ``` **Options**: - `--reports-dir DIR`: Directory containing Excel files (default: `reports`) - `--output FILE`: Output JSON file path (default: `output/report.json`) - `--verbose`: Print verbose output (default: True) **Examples**: ```bash # Use a custom reports directory python3 report_generator.py --reports-dir /path/to/excel/files # Specify custom output file python3 report_generator.py --output /path/to/output/report.json # Combine options python3 report_generator.py --reports-dir my_reports --output my_output/report.json ``` ### Programmatic Usage You can also use the report generator in your own Python scripts: ```python from report_generator import generate_report # Generate report with default settings report_data = generate_report() # Or with custom settings report_data = generate_report( reports_dir="my_reports", output_file="my_output/report.json", verbose=True ) # report_data is a dictionary containing the full report structure print(f"Processed {len(report_data['vendors'])} vendors") ``` ## Report Structure ### JSON Report Structure The generated JSON report follows this structure: ```json { "report_generated_at": "2025-11-05T22:00:00", "vendors": [ { "vendor_name": "VendorName", "total_items": 10, "closed_count": 5, "open_count": 3, "monitor_count": 2, "updates_24h": { "added": [...], "closed": [...], "changed_to_monitor": [...] }, "oldest_unaddressed": [...], "very_high_priority_items": [...], "high_priority_items": [...], "closed_items": [...], "monitor_items": [...], "open_items": [...] } ], "summary": { "total_vendors": 5, "total_items": 50, "total_closed": 25, "total_open": 15, "total_monitor": 10 } } ``` ### HTML Report Features The HTML report includes: - **Summary Cards**: Overview statistics at the top - **Vendor Tabs**: Quick navigation between vendors - **Status Tabs**: Filter by status (All, Yesterday's Updates, Oldest Unaddressed, Closed, Monitor, Open) - **Search & Filters**: - Search by item name or description - Filter by vendor, status, or priority - **Quick Filters**: - Show only vendors with yesterday's updates - Show only vendors with oldest unaddressed items - Show all vendors - **Interactive Elements**: Click tabs to switch views, use filters to narrow down results ## Data Processing Details ### Vendor Name Normalization The tool automatically normalizes vendor names: - Handles case variations (e.g., "autstand" → "Autstand") - Preserves intentional capitalization (e.g., "AutStand" stays as-is) - Normalizes combined vendors (e.g., "Autstand/Beumer") - Handles vendors in parentheses (e.g., "MFO (Amazon)") ### Status Normalization Statuses are normalized to: - **Complete**: Items with status containing "complete" or "complette" - **Monitor**: Items with status containing "monitor" or "montor" - **Incomplete**: All other items (default) ### Priority Classification Priorities are classified as: - **Very High**: Priority contains "(1) Very High" or "Very High" - **High**: Priority contains "(2) High" or "High" (but not "Very High") - **Medium**: Priority contains "(3) Medium" or "Medium" - **Low**: Priority contains "(4) Low" or "Low" ### 24-Hour Window Calculation The tool uses **Baltimore/Eastern timezone (America/New_York)** for calculating 24-hour updates: - Items are considered "added in last 24h" if their `date_identified` falls on yesterday's date - Items are considered "closed in last 24h" if their `date_completed` falls on yesterday's date - Items are considered "changed to monitor" if their status is Monitor and the date falls within the 24-hour window ## Output Files After running the generator, you'll find: - `output/report.json`: Structured JSON report data - `output/report.html`: Interactive HTML report (open in browser) - `output/preprocessed_data.txt`: Human-readable preprocessed data (for debugging) ## Project Structure ``` vendor_report/ ├── report_generator.py # Main report generation script ├── data_preprocessor.py # Excel data preprocessing and normalization ├── html_generator.py # HTML report generation ├── models.py # Pydantic data models ├── excel_to_text.py # Utility for Excel to text conversion ├── sharepoint_downloader.py # SharePoint file downloader ├── scheduler.py # Scheduled report generation ├── api_server.py # REST API for on-demand reports ├── web_ui.py # Web UI for easy access ├── config.py # Configuration management ├── config.yaml.template # Configuration template ├── requirements.txt # Python dependencies ├── reports/ # Directory for input Excel files ├── output/ # Directory for generated reports └── README.md # This file ``` ## Troubleshooting ### No Excel files found Ensure your Excel files are in the `reports/` directory and have `.xlsx` or `.xls` extensions. ### Date parsing errors The tool supports common date formats: - `MM/DD/YY` (e.g., `10/14/25`) - `MM/DD/YYYY` (e.g., `10/14/2025`) - `YYYY-MM-DD` (e.g., `2025-10-17`) - `YYYY-MM-DD HH:MM:SS` (e.g., `2025-10-17 00:00:00`) ### Permission errors If you encounter permission errors, ensure you have write access to the `output/` directory. ### Missing dependencies If you get import errors, ensure all dependencies are installed: ```bash pip install -r requirements.txt ``` ## Timezone Notes The tool uses **Baltimore/Eastern timezone (America/New_York)** for all date calculations. This ensures consistent 24-hour window calculations regardless of where the script is run. All dates are stored as timezone-aware datetime objects. ## SharePoint Integration The application can automatically download Excel files from SharePoint before generating reports. This is useful when your source data is stored in SharePoint. ### Setup SharePoint Integration 1. **Create a configuration file**: ```bash cp config.yaml.template config.yaml ``` 2. **Edit `config.yaml`** and configure SharePoint settings: ```yaml sharepoint: enabled: true site_url: "https://yourcompany.sharepoint.com/sites/YourSite" folder_path: "/Shared Documents/Reports" local_dir: "reports" use_app_authentication: true # Recommended for automation client_id: "your-azure-ad-client-id" client_secret: "your-azure-ad-client-secret" ``` 3. **Authentication Options**: **Option A: App Authentication (Recommended)** - Register an app in Azure AD - Grant SharePoint permissions (Sites.Read.All or Sites.ReadWrite.All) - Use `client_id` and `client_secret` in config - Set `use_app_authentication: true` **Option B: User Authentication** - Use your SharePoint username and password - Set `username` and `password` in config - Set `use_app_authentication: false` 4. **Test SharePoint download**: ```bash python sharepoint_downloader.py ``` ### Manual SharePoint Download Download files from SharePoint without generating a report: ```bash python sharepoint_downloader.py ``` ## Scheduled Report Generation The application can automatically generate reports on a schedule, optionally downloading from SharePoint first. ### Setup Scheduling 1. **Edit `config.yaml`**: ```yaml scheduler: enabled: true schedule_type: "interval" # or "cron" interval_hours: 24 # Generate every 24 hours # OR use cron expression: # cron_expression: "0 8 * * *" # 8 AM daily timezone: "America/New_York" ``` 2. **Start the scheduler**: ```bash python scheduler.py ``` The scheduler will run continuously and generate reports according to your schedule. 3. **Schedule Types**: - **interval**: Generate report every N hours - **cron**: Use cron expression for precise scheduling (e.g., "0 8 * * *" for 8 AM daily) - **once**: Run once immediately (for testing) ### Running Scheduler as a Service **Linux (systemd)**: ```bash # Create service file: /etc/systemd/system/vendor-report-scheduler.service [Unit] Description=Vendor Report Scheduler After=network.target [Service] Type=simple User=your-user WorkingDirectory=/path/to/vendor_report ExecStart=/usr/bin/python3 /path/to/vendor_report/scheduler.py Restart=always [Install] WantedBy=multi-user.target # Enable and start sudo systemctl enable vendor-report-scheduler sudo systemctl start vendor-report-scheduler ``` **Windows (Task Scheduler)**: - Create a scheduled task that runs `python scheduler.py` at startup or on a schedule ## Web UI & On-Demand Report Generation The application includes both a **Web UI** and a **REST API** for generating reports on demand. ### Web UI (Recommended for Easy Access) A simple, user-friendly web interface for generating reports without using the terminal. 1. **Start the Web UI server**: ```bash python web_ui.py ``` 2. **Open in browser**: ``` http://localhost:8080 ``` 3. **Features**: - One-click report generation - Download from SharePoint & generate (single button) - View generated reports - View service status - View configuration - No terminal knowledge required! ### REST API The application also includes a REST API for integration with other systems or manual triggers. ### Setup API Server 1. **Edit `config.yaml`**: ```yaml api: enabled: true host: "0.0.0.0" port: 8080 api_key: "your-secret-api-key" # Optional, for authentication ``` 2. **Start the Web UI** (recommended): ```bash python web_ui.py ``` Then open `http://localhost:8080` in your browser. **OR start the API server** (for programmatic access): ```bash python api_server.py ``` 3. **Generate report via API**: ```bash # Without authentication curl -X POST http://localhost:8080/api/generate \ -H "Content-Type: application/json" \ -d '{"download_from_sharepoint": true}' # With API key authentication curl -X POST http://localhost:8080/api/generate \ -H "Content-Type: application/json" \ -H "X-API-Key: your-secret-api-key" \ -d '{"download_from_sharepoint": true}' ``` ### API Endpoints - **POST `/api/generate`**: Generate report on demand - Request body (optional): ```json { "download_from_sharepoint": true, "reports_dir": "reports", "output_file": "output/report.json" } ``` - **GET `/api/status`**: Get service status and configuration - **GET `/health`**: Health check endpoint ### Example: Integration with Webhook You can trigger report generation from SharePoint webhooks, Power Automate, or any HTTP client: ```python import requests response = requests.post( 'http://your-server:8080/api/generate', json={'download_from_sharepoint': True}, headers={'X-API-Key': 'your-api-key'} ) print(response.json()) ``` ## Configuration The application uses a YAML configuration file (`config.yaml`) for all settings. You can also use environment variables: ### Environment Variables ```bash # SharePoint export SHAREPOINT_ENABLED=true export SHAREPOINT_SITE_URL="https://yourcompany.sharepoint.com/sites/YourSite" export SHAREPOINT_FOLDER_PATH="/Shared Documents/Reports" export SHAREPOINT_CLIENT_ID="your-client-id" export SHAREPOINT_CLIENT_SECRET="your-client-secret" export SHAREPOINT_USE_APP_AUTH=true # Scheduler export SCHEDULER_ENABLED=true export SCHEDULER_INTERVAL_HOURS=24 # API export API_ENABLED=true export API_PORT=8080 export API_KEY="your-api-key" ``` ## Complete Workflow Example Here's a complete example setup for automated SharePoint → Report generation: 1. **Setup configuration** (`config.yaml`): ```yaml sharepoint: enabled: true site_url: "https://company.sharepoint.com/sites/Reports" folder_path: "/Shared Documents/Vendor Reports" use_app_authentication: true client_id: "your-client-id" client_secret: "your-client-secret" scheduler: enabled: true schedule_type: "cron" cron_expression: "0 8 * * *" # 8 AM daily timezone: "America/New_York" report: output_dir: "output" reports_dir: "reports" ``` 2. **Start scheduler**: ```bash python scheduler.py ``` 3. **The scheduler will**: - Download latest Excel files from SharePoint at 8 AM daily - Generate reports automatically - Save to `output/report.json` and `output/report.html` ## License [Add your license information here] ## Contributing [Add contribution guidelines if applicable] ## Support For issues or questions, please contact [your contact information or issue tracker URL].