# Vendor Report Generator

A Python tool that generates comprehensive vendor punchlist reports from Excel files. The tool processes Excel data, normalizes vendor information, calculates metrics, and generates both JSON and interactive HTML reports.

> **📘 For Taskboard Integration**: See [TASKBOARD_INTEGRATION_CONTEXT.md](./TASKBOARD_INTEGRATION_CONTEXT.md) for detailed context and integration possibilities.

## Features

- **Direct Excel Processing**: Reads Excel files directly using pandas
- **Data Normalization**: Automatically normalizes vendor names, statuses, and priorities
- **24-Hour Updates**: Tracks items added, closed, or changed to monitor status in the last 24 hours (based on Baltimore/Eastern timezone)
- **Priority Tracking**: Groups items by priority levels (Very High, High, Medium, Low)
- **Oldest Unaddressed Items**: Identifies and highlights the oldest 3 unaddressed items per vendor
- **Interactive HTML Reports**: Generates searchable, filterable HTML reports with tabs and filters
- **JSON Export**: Exports structured JSON data for further processing
- **SharePoint Integration**: Automatically download Excel files from SharePoint
- **Scheduled Generation**: Automatically generate reports on a schedule (interval or cron)
- **Web API**: REST API for on-demand report generation

## Requirements

- Python 3.8 or higher
- Dependencies listed in `requirements.txt`

## Installation

1. **Clone the repository**:
   ```bash
   git clone https://gitea.lci.ge/ilia.gurielidze/vendor_report.git
   cd vendor_report
   ```

2. **Create a virtual environment** (recommended):
   ```bash
   python3 -m venv venv
   source venv/bin/activate  # On Windows: venv\Scripts\activate
   ```

3. **Install dependencies**:
   ```bash
   pip install -r requirements.txt
   ```

## Setup

1. **Prepare your Excel files**:
   - Place your Excel files (`.xlsx` or `.xls`) in the `reports/` directory
   - Ensure your Excel files have the following columns (in order):
     - Column 0: Punchlist Name
     - Column 1: Vendor
     - Column 2: Priority
     - Column 3: Description
     - Column 4: Date Identified
     - Column 5: Status Updates
     - Column 6: Issue Image
     - Column 7: Status
     - Column 8: Date Completed (optional)

2. **Create necessary directories** (if they don't exist):
   ```bash
   mkdir -p reports output
   ```

## Usage

### Basic Usage

Generate a report from Excel files in the `reports/` directory:

```bash
python3 report_generator.py
```

This will:
- Process all Excel files in the `reports/` directory
- Generate a JSON report at `output/report.json`
- Generate an HTML report at `output/report.html`
- Save preprocessed data to `output/preprocessed_data.txt`

### Command-Line Options

```bash
python3 report_generator.py [OPTIONS]
```

**Options**:
- `--reports-dir DIR`: Directory containing Excel files (default: `reports`)
- `--output FILE`: Output JSON file path (default: `output/report.json`)
- `--verbose`: Print verbose output (default: True)

**Examples**:

```bash
# Use a custom reports directory
python3 report_generator.py --reports-dir /path/to/excel/files

# Specify custom output file
python3 report_generator.py --output /path/to/output/report.json

# Combine options
python3 report_generator.py --reports-dir my_reports --output my_output/report.json
```

### Programmatic Usage

You can also use the report generator in your own Python scripts:

```python
from report_generator import generate_report

# Generate report with default settings
report_data = generate_report()

# Or with custom settings
report_data = generate_report(
    reports_dir="my_reports",
    output_file="my_output/report.json",
    verbose=True
)

# report_data is a dictionary containing the full report structure
print(f"Processed {len(report_data['vendors'])} vendors")
```

## Report Structure

### JSON Report Structure

The generated JSON report follows this structure:

```json
{
  "report_generated_at": "2025-11-05T22:00:00",
  "vendors": [
    {
      "vendor_name": "VendorName",
      "total_items": 10,
      "closed_count": 5,
      "open_count": 3,
      "monitor_count": 2,
      "updates_24h": {
        "added": [...],
        "closed": [...],
        "changed_to_monitor": [...]
      },
      "oldest_unaddressed": [...],
      "very_high_priority_items": [...],
      "high_priority_items": [...],
      "closed_items": [...],
      "monitor_items": [...],
      "open_items": [...]
    }
  ],
  "summary": {
    "total_vendors": 5,
    "total_items": 50,
    "total_closed": 25,
    "total_open": 15,
    "total_monitor": 10
  }
}
```

### HTML Report Features

The HTML report includes:

- **Summary Cards**: Overview statistics at the top
- **Vendor Tabs**: Quick navigation between vendors
- **Status Tabs**: Filter by status (All, Yesterday's Updates, Oldest Unaddressed, Closed, Monitor, Open)
- **Search & Filters**: 
  - Search by item name or description
  - Filter by vendor, status, or priority
- **Quick Filters**: 
  - Show only vendors with yesterday's updates
  - Show only vendors with oldest unaddressed items
  - Show all vendors
- **Interactive Elements**: Click tabs to switch views, use filters to narrow down results

## Data Processing Details

### Vendor Name Normalization

The tool automatically normalizes vendor names:
- Handles case variations (e.g., "autstand" → "Autstand")
- Preserves intentional capitalization (e.g., "AutStand" stays as-is)
- Normalizes combined vendors (e.g., "Autstand/Beumer")
- Handles vendors in parentheses (e.g., "MFO (Amazon)")

### Status Normalization

Statuses are normalized to:
- **Complete**: Items with status containing "complete" or "complette"
- **Monitor**: Items with status containing "monitor" or "montor"
- **Incomplete**: All other items (default)

### Priority Classification

Priorities are classified as:
- **Very High**: Priority contains "(1) Very High" or "Very High"
- **High**: Priority contains "(2) High" or "High" (but not "Very High")
- **Medium**: Priority contains "(3) Medium" or "Medium"
- **Low**: Priority contains "(4) Low" or "Low"

### 24-Hour Window Calculation

The tool uses **Baltimore/Eastern timezone (America/New_York)** for calculating 24-hour updates:
- Items are considered "added in last 24h" if their `date_identified` falls on yesterday's date
- Items are considered "closed in last 24h" if their `date_completed` falls on yesterday's date
- Items are considered "changed to monitor" if their status is Monitor and the date falls within the 24-hour window

## Output Files

After running the generator, you'll find:

- `output/report.json`: Structured JSON report data
- `output/report.html`: Interactive HTML report (open in browser)
- `output/preprocessed_data.txt`: Human-readable preprocessed data (for debugging)

## Project Structure

```
vendor_report/
├── report_generator.py      # Main report generation script
├── data_preprocessor.py     # Excel data preprocessing and normalization
├── html_generator.py        # HTML report generation
├── models.py                # Pydantic data models
├── excel_to_text.py         # Utility for Excel to text conversion
├── sharepoint_downloader.py # SharePoint file downloader
├── scheduler.py             # Scheduled report generation
├── api_server.py            # REST API for on-demand reports
├── web_ui.py                # Web UI for easy access
├── config.py                # Configuration management
├── config.yaml.template     # Configuration template
├── requirements.txt         # Python dependencies
├── reports/                 # Directory for input Excel files
├── output/                  # Directory for generated reports
└── README.md               # This file
```

## Troubleshooting

### No Excel files found

Ensure your Excel files are in the `reports/` directory and have `.xlsx` or `.xls` extensions.

### Date parsing errors

The tool supports common date formats:
- `MM/DD/YY` (e.g., `10/14/25`)
- `MM/DD/YYYY` (e.g., `10/14/2025`)
- `YYYY-MM-DD` (e.g., `2025-10-17`)
- `YYYY-MM-DD HH:MM:SS` (e.g., `2025-10-17 00:00:00`)

### Permission errors

If you encounter permission errors, ensure you have write access to the `output/` directory.

### Missing dependencies

If you get import errors, ensure all dependencies are installed:
```bash
pip install -r requirements.txt
```

## Timezone Notes

The tool uses **Baltimore/Eastern timezone (America/New_York)** for all date calculations. This ensures consistent 24-hour window calculations regardless of where the script is run. All dates are stored as timezone-aware datetime objects.

## SharePoint Integration

The application can automatically download Excel files from SharePoint before generating reports. This is useful when your source data is stored in SharePoint.

### Setup SharePoint Integration

1. **Create a configuration file**:
   ```bash
   cp config.yaml.template config.yaml
   ```

2. **Edit `config.yaml`** and configure SharePoint settings:
   ```yaml
   sharepoint:
     enabled: true
     site_url: "https://yourcompany.sharepoint.com/sites/YourSite"
     folder_path: "/Shared Documents/Reports"
     local_dir: "reports"
     use_app_authentication: true  # Recommended for automation
     client_id: "your-azure-ad-client-id"
     client_secret: "your-azure-ad-client-secret"
   ```

3. **Authentication Options**:
   
   **Option A: App Authentication (Recommended)**
   - Register an app in Azure AD
   - Grant SharePoint permissions (Sites.Read.All or Sites.ReadWrite.All)
   - Use `client_id` and `client_secret` in config
   - Set `use_app_authentication: true`
   
   **Option B: User Authentication**
   - Use your SharePoint username and password
   - Set `username` and `password` in config
   - Set `use_app_authentication: false`

4. **Test SharePoint download**:
   ```bash
   python sharepoint_downloader.py
   ```

### Manual SharePoint Download

Download files from SharePoint without generating a report:
```bash
python sharepoint_downloader.py
```

## Scheduled Report Generation

The application can automatically generate reports on a schedule, optionally downloading from SharePoint first.

### Setup Scheduling

1. **Edit `config.yaml`**:
   ```yaml
   scheduler:
     enabled: true
     schedule_type: "interval"  # or "cron"
     interval_hours: 24  # Generate every 24 hours
     # OR use cron expression:
     # cron_expression: "0 8 * * *"  # 8 AM daily
     timezone: "America/New_York"
   ```

2. **Start the scheduler**:
   ```bash
   python scheduler.py
   ```

   The scheduler will run continuously and generate reports according to your schedule.

3. **Schedule Types**:
   - **interval**: Generate report every N hours
   - **cron**: Use cron expression for precise scheduling (e.g., "0 8 * * *" for 8 AM daily)
   - **once**: Run once immediately (for testing)

### Running Scheduler as a Service

**Linux (systemd)**:
```bash
# Create service file: /etc/systemd/system/vendor-report-scheduler.service
[Unit]
Description=Vendor Report Scheduler
After=network.target

[Service]
Type=simple
User=your-user
WorkingDirectory=/path/to/vendor_report
ExecStart=/usr/bin/python3 /path/to/vendor_report/scheduler.py
Restart=always

[Install]
WantedBy=multi-user.target

# Enable and start
sudo systemctl enable vendor-report-scheduler
sudo systemctl start vendor-report-scheduler
```

**Windows (Task Scheduler)**:
- Create a scheduled task that runs `python scheduler.py` at startup or on a schedule

## Web UI & On-Demand Report Generation

The application includes both a **Web UI** and a **REST API** for generating reports on demand.

### Web UI (Recommended for Easy Access)

A simple, user-friendly web interface for generating reports without using the terminal.

1. **Start the Web UI server**:
   ```bash
   python web_ui.py
   ```

2. **Open in browser**:
   ```
   http://localhost:8080
   ```

3. **Features**:
   - One-click report generation
   - Download from SharePoint & generate (single button)
   - View generated reports
   - View service status
   - View configuration
   - No terminal knowledge required!

### REST API

The application also includes a REST API for integration with other systems or manual triggers.

### Setup API Server

1. **Edit `config.yaml`**:
   ```yaml
   api:
     enabled: true
     host: "0.0.0.0"
     port: 8080
     api_key: "your-secret-api-key"  # Optional, for authentication
   ```

2. **Start the Web UI** (recommended):
   ```bash
   python web_ui.py
   ```
   
   Then open `http://localhost:8080` in your browser.
   
   **OR start the API server** (for programmatic access):
   ```bash
   python api_server.py
   ```

3. **Generate report via API**:
   ```bash
   # Without authentication
   curl -X POST http://localhost:8080/api/generate \
     -H "Content-Type: application/json" \
     -d '{"download_from_sharepoint": true}'
   
   # With API key authentication
   curl -X POST http://localhost:8080/api/generate \
     -H "Content-Type: application/json" \
     -H "X-API-Key: your-secret-api-key" \
     -d '{"download_from_sharepoint": true}'
   ```

### API Endpoints

- **POST `/api/generate`**: Generate report on demand
  - Request body (optional):
    ```json
    {
      "download_from_sharepoint": true,
      "reports_dir": "reports",
      "output_file": "output/report.json"
    }
    ```
  
- **GET `/api/status`**: Get service status and configuration
  
- **GET `/health`**: Health check endpoint

### Example: Integration with Webhook

You can trigger report generation from SharePoint webhooks, Power Automate, or any HTTP client:

```python
import requests

response = requests.post(
    'http://your-server:8080/api/generate',
    json={'download_from_sharepoint': True},
    headers={'X-API-Key': 'your-api-key'}
)
print(response.json())
```

## Configuration

The application uses a YAML configuration file (`config.yaml`) for all settings. You can also use environment variables:

### Environment Variables

```bash
# SharePoint
export SHAREPOINT_ENABLED=true
export SHAREPOINT_SITE_URL="https://yourcompany.sharepoint.com/sites/YourSite"
export SHAREPOINT_FOLDER_PATH="/Shared Documents/Reports"
export SHAREPOINT_CLIENT_ID="your-client-id"
export SHAREPOINT_CLIENT_SECRET="your-client-secret"
export SHAREPOINT_USE_APP_AUTH=true

# Scheduler
export SCHEDULER_ENABLED=true
export SCHEDULER_INTERVAL_HOURS=24

# API
export API_ENABLED=true
export API_PORT=8080
export API_KEY="your-api-key"
```

## Complete Workflow Example

Here's a complete example setup for automated SharePoint → Report generation:

1. **Setup configuration** (`config.yaml`):
   ```yaml
   sharepoint:
     enabled: true
     site_url: "https://company.sharepoint.com/sites/Reports"
     folder_path: "/Shared Documents/Vendor Reports"
     use_app_authentication: true
     client_id: "your-client-id"
     client_secret: "your-client-secret"
   
   scheduler:
     enabled: true
     schedule_type: "cron"
     cron_expression: "0 8 * * *"  # 8 AM daily
     timezone: "America/New_York"
   
   report:
     output_dir: "output"
     reports_dir: "reports"
   ```

2. **Start scheduler**:
   ```bash
   python scheduler.py
   ```

3. **The scheduler will**:
   - Download latest Excel files from SharePoint at 8 AM daily
   - Generate reports automatically
   - Save to `output/report.json` and `output/report.html`

## License

[Add your license information here]

## Contributing

[Add contribution guidelines if applicable]

## Support

For issues or questions, please contact [your contact information or issue tracker URL].