Add comprehensive README with setup and usage instructions

This commit is contained in:
ilia gu 2025-11-05 22:43:06 +04:00
parent 8f217a87c2
commit f294ac155e

270
README.md Normal file
View File

@ -0,0 +1,270 @@
# Vendor Report Generator
A Python tool that generates comprehensive vendor punchlist reports from Excel files. The tool processes Excel data, normalizes vendor information, calculates metrics, and generates both JSON and interactive HTML reports.
## Features
- **Direct Excel Processing**: Reads Excel files directly using pandas (no LLM required)
- **Data Normalization**: Automatically normalizes vendor names, statuses, and priorities
- **24-Hour Updates**: Tracks items added, closed, or changed to monitor status in the last 24 hours (based on Baltimore/Eastern timezone)
- **Priority Tracking**: Groups items by priority levels (Very High, High, Medium, Low)
- **Oldest Unaddressed Items**: Identifies and highlights the oldest 3 unaddressed items per vendor
- **Interactive HTML Reports**: Generates searchable, filterable HTML reports with tabs and filters
- **JSON Export**: Exports structured JSON data for further processing
## Requirements
- Python 3.8 or higher
- Dependencies listed in `requirements.txt`
## Installation
1. **Clone the repository**:
```bash
git clone https://gitea.lci.ge/ilia.gurielidze/vendor_report.git
cd vendor_report
```
2. **Create a virtual environment** (recommended):
```bash
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
3. **Install dependencies**:
```bash
pip install -r requirements.txt
```
## Setup
1. **Prepare your Excel files**:
- Place your Excel files (`.xlsx` or `.xls`) in the `reports/` directory
- Ensure your Excel files have the following columns (in order):
- Column 0: Punchlist Name
- Column 1: Vendor
- Column 2: Priority
- Column 3: Description
- Column 4: Date Identified
- Column 5: Status Updates
- Column 6: Issue Image
- Column 7: Status
- Column 8: Date Completed (optional)
2. **Create necessary directories** (if they don't exist):
```bash
mkdir -p reports output
```
## Usage
### Basic Usage
Generate a report from Excel files in the `reports/` directory:
```bash
python3 report_generator.py
```
This will:
- Process all Excel files in the `reports/` directory
- Generate a JSON report at `output/report.json`
- Generate an HTML report at `output/report.html`
- Save preprocessed data to `output/preprocessed_data.txt`
### Command-Line Options
```bash
python3 report_generator.py [OPTIONS]
```
**Options**:
- `--reports-dir DIR`: Directory containing Excel files (default: `reports`)
- `--output FILE`: Output JSON file path (default: `output/report.json`)
- `--verbose`: Print verbose output (default: True)
**Examples**:
```bash
# Use a custom reports directory
python3 report_generator.py --reports-dir /path/to/excel/files
# Specify custom output file
python3 report_generator.py --output /path/to/output/report.json
# Combine options
python3 report_generator.py --reports-dir my_reports --output my_output/report.json
```
### Programmatic Usage
You can also use the report generator in your own Python scripts:
```python
from report_generator import generate_report
# Generate report with default settings
report_data = generate_report()
# Or with custom settings
report_data = generate_report(
reports_dir="my_reports",
output_file="my_output/report.json",
verbose=True
)
# report_data is a dictionary containing the full report structure
print(f"Processed {len(report_data['vendors'])} vendors")
```
## Report Structure
### JSON Report Structure
The generated JSON report follows this structure:
```json
{
"report_generated_at": "2025-11-05T22:00:00",
"vendors": [
{
"vendor_name": "VendorName",
"total_items": 10,
"closed_count": 5,
"open_count": 3,
"monitor_count": 2,
"updates_24h": {
"added": [...],
"closed": [...],
"changed_to_monitor": [...]
},
"oldest_unaddressed": [...],
"very_high_priority_items": [...],
"high_priority_items": [...],
"closed_items": [...],
"monitor_items": [...],
"open_items": [...]
}
],
"summary": {
"total_vendors": 5,
"total_items": 50,
"total_closed": 25,
"total_open": 15,
"total_monitor": 10
}
}
```
### HTML Report Features
The HTML report includes:
- **Summary Cards**: Overview statistics at the top
- **Vendor Tabs**: Quick navigation between vendors
- **Status Tabs**: Filter by status (All, Yesterday's Updates, Oldest Unaddressed, Closed, Monitor, Open)
- **Search & Filters**:
- Search by item name or description
- Filter by vendor, status, or priority
- **Quick Filters**:
- Show only vendors with yesterday's updates
- Show only vendors with oldest unaddressed items
- Show all vendors
- **Interactive Elements**: Click tabs to switch views, use filters to narrow down results
## Data Processing Details
### Vendor Name Normalization
The tool automatically normalizes vendor names:
- Handles case variations (e.g., "autstand" → "Autstand")
- Preserves intentional capitalization (e.g., "AutStand" stays as-is)
- Normalizes combined vendors (e.g., "Autstand/Beumer")
- Handles vendors in parentheses (e.g., "MFO (Amazon)")
### Status Normalization
Statuses are normalized to:
- **Complete**: Items with status containing "complete" or "complette"
- **Monitor**: Items with status containing "monitor" or "montor"
- **Incomplete**: All other items (default)
### Priority Classification
Priorities are classified as:
- **Very High**: Priority contains "(1) Very High" or "Very High"
- **High**: Priority contains "(2) High" or "High" (but not "Very High")
- **Medium**: Priority contains "(3) Medium" or "Medium"
- **Low**: Priority contains "(4) Low" or "Low"
### 24-Hour Window Calculation
The tool uses **Baltimore/Eastern timezone (America/New_York)** for calculating 24-hour updates:
- Items are considered "added in last 24h" if their `date_identified` falls on yesterday's date
- Items are considered "closed in last 24h" if their `date_completed` falls on yesterday's date
- Items are considered "changed to monitor" if their status is Monitor and the date falls within the 24-hour window
## Output Files
After running the generator, you'll find:
- `output/report.json`: Structured JSON report data
- `output/report.html`: Interactive HTML report (open in browser)
- `output/preprocessed_data.txt`: Human-readable preprocessed data (for debugging)
## Project Structure
```
vendor_report/
├── report_generator.py # Main report generation script
├── data_preprocessor.py # Excel data preprocessing and normalization
├── html_generator.py # HTML report generation
├── models.py # Pydantic data models
├── excel_to_text.py # Utility for Excel to text conversion
├── requirements.txt # Python dependencies
├── reports/ # Directory for input Excel files
├── output/ # Directory for generated reports
└── README.md # This file
```
## Troubleshooting
### No Excel files found
Ensure your Excel files are in the `reports/` directory and have `.xlsx` or `.xls` extensions.
### Date parsing errors
The tool supports common date formats:
- `MM/DD/YY` (e.g., `10/14/25`)
- `MM/DD/YYYY` (e.g., `10/14/2025`)
- `YYYY-MM-DD` (e.g., `2025-10-17`)
- `YYYY-MM-DD HH:MM:SS` (e.g., `2025-10-17 00:00:00`)
### Permission errors
If you encounter permission errors, ensure you have write access to the `output/` directory.
### Missing dependencies
If you get import errors, ensure all dependencies are installed:
```bash
pip install -r requirements.txt
```
## Timezone Notes
The tool uses **Baltimore/Eastern timezone (America/New_York)** for all date calculations. This ensures consistent 24-hour window calculations regardless of where the script is run. All dates are stored as timezone-aware datetime objects.
## License
[Add your license information here]
## Contributing
[Add contribution guidelines if applicable]
## Support
For issues or questions, please contact [your contact information or issue tracker URL].