HTML parsers are software for automated Hypertext Markup Language (HTML) parsing. They have two main purposes:
Parser | License | Implementation language(s) | Latest date* | HTML parsing[1] | HTML5-compliant parsing | Clean HTML** | Update HTML*** |
---|---|---|---|---|---|---|---|
HTML Tidy | W3C license | ANSI C | 2021-07-17[2] | Yes[3] | Yes | Yes[3] | Yes |
HtmlUnit | Apache License 2.0 | Java | 2023-10-31[4] | Yes | ? | No | No |
Beautiful Soup | MIT License | Python | 2023-04-07[5] | Yes | Yes | ? | No |
jsoup | MIT License | Java | 2023-12-29[6] | Yes | Yes | Yes | Yes |
Parser | License | Implementation language(s) | Latest date* | HTML Parsing | HTML5-compliant Parsing | Clean HTML** | Update HTML*** |
style="text-align:center;"
).