About Categenie
Intelligent data cleaning for attribute-value standardization
Our Mission
Transform messy, inconsistent attribute data into clean, structured, and validated information that's ready for analysis and application.
How It Works
Remove Prefixes
Strip "rp_" and other prefixes from values
Numerical Without Units
Extract pure numbers, decimals, fractions
Numerical With Units
Validate and standardize unit measurements
Non-Numerical Values
Classify categorical and varchar data
Thread Sizes
Parse thread specifications (e.g., 1/2"-13)
Dual Units
Handle compound units (e.g., cu in, degree F)
Mixed Fractions
Convert mixed numbers (e.g., 2 3/4 in)
Dimensions
Split L x W x H into separate values
"To" Ranges
Parse range values (e.g., 100 to 500 km)
"+/-" Ranges
Handle tolerance values (+/- 0.05)
Moderation Filter
Flag ambiguous values for review
What You Get
Cleaned Data
- Standardized units
- Validated formats
- Structured data types
- Human-readable display values
- Ready for immediate use
Needs Review
- Invalid units flagged
- Parsing failures identified
- Edge cases highlighted
- Ambiguous patterns marked
- Full context preserved
Technology
Built with Python, Flask, and pandas, Categenie leverages intelligent pattern matching and validation to automatically clean and structure your data.
Python
Flask
Pandas