This document describes the architecture and important technical details of the SwayReports project.
The application uses BeautifulSoup for HTML parsing and generation. While the application previously used the soup.encode(formatter='minimal').decode() method, we’ve now implemented a more robust approach for handling HTML entities, particularly ampersands.
# Old way to write BeautifulSoup HTML to files - has issues with double escaping
with open(file_path, 'w', encoding='utf-8') as f:
f.write(soup.encode(formatter='minimal').decode())
Our improved solution addresses issues with HTML entities like & being incorrectly re-encoded as & when written to files:
# Reading content with html.unescape to handle existing entities
title = html.unescape(element.get_text().strip())
# Writing content properly
title_div = soup.new_tag('div')
title_div.clear()
title_html = BeautifulSoup(f"<span>{title}</span>", 'html.parser')
title_div.append(title_html.span.contents[0])
This approach:
html.unescape() when reading text to convert any entities to their character equivalentsAfter extensive testing, this approach correctly preserves ampersands and other special characters throughout the HTML generation pipeline.
These scripts both implement the improved HTML entity handling to ensure ampersands and other special characters are displayed correctly in the browser.
This utility script can fix any double-escaped entities in existing showcase files. It:
&amp; → &)This script synchronizes titles and snippets between the showcase HTML file and individual report HTML files. It ensures consistency in how reports are presented across the application.
The script: