�
*�Kg�B����dZddlZddlZddlmZdgZejd��Zejd��Zejd��Z ejd��Z
ejd ��Zejd
��Zejd��Z
ejd��Zejd
��Zejdej��Zejd
��Zejd��ZGd�dej��ZdS)zA parser for HTML and XHTML.�N)�unescape�
HTMLParserz[&<]z
&[a-zA-Z#]z%&([a-zA-Z][-.a-zA-Z0-9]*)[^a-zA-Z0-9]z)&#(?:[0-9]+|[xX][0-9a-fA-F]+)[^0-9a-fA-F]z <[a-zA-Z]�>z--\s*>z+([a-zA-Z][^\t\n\r\f />\x00]*)(?:\s|/(?!>))*z]((?<=[\'"\s/])[^\s/>][^\s/=>]*)(\s*=+\s*(\'[^\']*\'|"[^"]*"|(?![\'"])[^>\s]*))?(?:\s|/(?!>))*aF
<[a-zA-Z][^\t\n\r\f />\x00]* # tag name
(?:[\s/]* # optional whitespace before attribute name
(?:(?<=['"\s/])[^\s/>][^\s/=>]* # attribute name
(?:\s*=+\s* # value indicator
(?:'[^']*' # LITA-enclosed value
|"[^"]*" # LIT-enclosed value
|(?!['"])[^>\s]* # bare value
)
\s* # possibly followed by a space
)?(?:\s|/(?!>))*
)*
)?
\s* # trailing whitespace
z#</\s*([a-zA-Z][-.a-zA-Z0-9:_]*)\s*>c��eZdZdZdZdd�d�Zd�Zd�Zd�Zd Z d
�Z
d�Zd�Zd
�Z
d�Zdd�Zd�Zd�Zd�Zd�Zd�Zd�Zd�Zd�Zd�Zd�Zd�Zd�Zd�Zd�Zd S) raEFind tags and other markup and call handler functions.
Usage:
p = HTMLParser()
p.feed(data)
...
p.close()
Start tags are handled by calling self.handle_starttag() or
self.handle_startendtag(); end tags by self.handle_endtag(). The
data between tags is passed from the parser to the derived class
by calling self.handle_data() with the data as argument (the data
may be split up in arbitrary chunks). If convert_charrefs is
True the character references are converted automatically to the
corresponding Unicode character (and self.handle_data() is no
longer split in chunks), otherwise they are passed by calling
self.handle_entityref() or self.handle_charref() with the string
containing respectively the named or numeric reference as the
argument.
)�script�styleT)�convert_charrefsc�<�||_|���dS)z�Initialize and reset this instance.
If convert_charrefs is True (the default), all character references
are automatically converted to the corresponding Unicode characters.
N)r �reset)�selfr s �"/usr/lib/python3.11/html/parser.py�__init__zHTMLParser.__init__Vs��!1����
�
������c��d|_d|_t|_d|_t
j�|��dS)z1Reset this instance. Loses all unprocessed data.�z???N)�rawdata�lasttag�interesting_normal�interesting�
cdata_elem�_markupbase�
ParserBaser�rs r
rzHTMLParser.reset_s<��������-��������$�$�T�*�*�*�*�*rc�N�|j|z|_|�d��dS)z�Feed data to the parser.
Call this as often as you want, with as little or as much text
as you want (may include '\n').
rN)r�goahead�r�datas r
�feedzHTMLParser.feedgs%���|�d�*������Q�����rc�0�|�d��dS)zHandle any buffered data.�N)rrs r
�closezHTMLParser.closeps�����Q�����rNc��|jS)z)Return full source of start tag: '<...>'.)�_HTMLParser__starttag_textrs r
�get_starttag_textzHTMLParser.get_starttag_textvs���#�#rc��|���|_tjd|jztj��|_dS)Nz</\s*%s\s*>)�lowerr�re�compile�Ir)r�elems r
�set_cdata_modezHTMLParser.set_cdata_modezs4���*�*�,�,����:�n�t��&F���M�M����rc�,�t|_d|_dS�N)rrrrs r
�clear_cdata_modezHTMLParser.clear_cdata_mode~s��-�������rc��|j}d}t|��}||k�r#|jr}|jsv|�d|��}|dkrY|�dt
||dz
����}|dkr*tjd��� ||��s�n�|}n=|j
� ||��}|r|���}n|jr�nd|}||krV|jr2|js+|�t|||�����n|�|||���|�||��}||kr�n�|j}|d|���r�t �||��r|�|��} n�|d|��r|�|��} n�|d|��r|�|��} nj|d|��r|�|��} nH|d |��r|�|��} n&|d
z|kr|�d��|d
z} n�n�| dkr�|s�n�|�d|d
z��} | dkr%|�d|d
z��} | dkr|d
z} n| d
z
} |jr2|js+|�t||| �����n|�||| ���|�|| ��}�n-|d|��r�t.�||��}|rq|���d
d�}
|�|
��|���} |d| d
z
��s| d
z
} |�|| ��}���d||d�vr9|�|||d
z���|�||d
z��}�nS|d|���r5t6�||��}|rj|�d
��}
|�|
��|���} |d| d
z
��s| d
z
} |�|| ��}��kt:�||��}|rX|rU|���||d�kr5|���} | |kr|} |�||d
z��}nJ|d
z|kr/|�d��|�||d
z��}nnJd���||k��#|ry||krs|jsl|jr2|js+|�t|||�����n|�|||���|�||��}||d�|_dS)Nr�<�&�"z[\s;]�</�<!--�<?�<!r rz&#�����;zinteresting.search() lied)r�lenr r�find�rfind�maxr'r(�searchr�start�handle_datar� updatepos�
startswith�starttagopen�match�parse_starttag�parse_endtag�
parse_comment�parse_pi�parse_html_declaration�charref�group�handle_charref�end� entityref�handle_entityref�
incomplete)rrMr�i�n�j�ampposrDrB�k�names r
rzHTMLParser.goahead�s8���,��
����L�L���!�e�e��$�
�T�_�
��L�L��a�(�(���q�5�5�%�]�]�3��A�q��t���=�=�F��!����J�x�0�0�7�7���H�H�$���A���(�/�/���;�;�������
�
�A�A������A��1�u�u��(�3���3��$�$�X�g�a��c�l�%;�%;�<�<�<�<��$�$�W�Q�q�S�\�2�2�2����q�!�$�$�A��A�v�v�u� �+�J��z�#�q�!�!�J
6��%�%�g�q�1�1���+�+�A�.�.�A�A��Z��a�(�(���)�)�!�,�,�A�A��Z���*�*�
��*�*�1�-�-�A�A��Z��a�(�(���
�
�a�(�(�A�A��Z��a�(�(���3�3�A�6�6�A�A��!�e�q�[�[��$�$�S�)�)�)��A��A�A���q�5�5�������S�!�a�%�0�0�A��1�u�u�#�L�L��a�!�e�4�4���q�5�5� !�A��A���Q����,�7�T�_�7��(�(��'�!�A�#�,�)?�)?�@�@�@�@��(�(���1���6�6�6��N�N�1�a�(�(�����D�!�$�$�+
6��
�
�g�q�1�1���� �;�;�=�=��2��.�D��'�'��-�-�-�� � ���A�%�:�c�1�Q�3�/�/�"���E�����q�!�,�,�A���g�a�b�b�k�)�)��(�(���1�Q�3���8�8�8� �N�N�1�a��c�2�2�����C��#�#�
6�!�����3�3���� �;�;�q�>�>�D��)�)�$�/�/�/�� � ���A�%�:�c�1�Q�3�/�/�"���E�����q�!�,�,�A��"�(�(��!�4�4�����5�u�{�{�}�}������;�;�!�I�I�K�K����6�6� !�A� �N�N�1�a�!�e�4�4����!�e�q�[�[��$�$�S�)�)�)����q�!�a�%�0�0�A�A��5�5�5�5�5�S�!�e�e�V� %�1�q�5�5���5��$�
/�T�_�
/�� � ��'�!�A�#�,�!7�!7�8�8�8�8�� � ���1���.�.�.����q�!�$�$�A��q�r�r�{����rc���|j}|||dz�dks
Jd���|||dz�dkr|�|��S|||dz�dkr|�|��S|||dz����d krF|�d
|dz��}|dkrdS|�||dz|���|dzS|�|��S)
Nr7r6z+unexpected call to parse_html_declaration()�r4�z<![� z <!doctyperr8r )rrG�parse_marked_sectionr&r;�handle_decl�parse_bogus_comment)rrQr�gtposs r
rIz!HTMLParser.parse_html_declaration�s���,���q��1��u�~��%�%�%�)C�%�%�%��1�Q�q�S�5�>�V�#�#��%�%�a�(�(�(�
�Q�q��s�U�^�u�
$�
$��,�,�Q�/�/�/�
�Q�q��s�U�^�
!�
!�
#�
#�{�
2�
2��L�L��a��c�*�*�E���{�{��r����W�Q�q�S��Y�/�0�0�0���7�N��+�+�A�.�.�.rr c���|j}|||dz�dvs
Jd���|�d|dz��}|dkrdS|r |�||dz|���|dzS)Nr7)r6r3z"unexpected call to parse_comment()rr8r )rr;�handle_comment)rrQ�reportr�poss r
r]zHTMLParser.parse_bogus_comments����,���q��1��u�~��-�-�-�1B�-�-�-��l�l�3��!��$�$���"�9�9��2�� 2������!��C�� 0�1�1�1��Q�w�rc��|j}|||dz�dks
Jd���t�||dz��}|sdS|���}|�||dz|���|���}|S)Nr7r5zunexpected call to parse_pi()r8)r�picloser>r?� handle_pirM)rrQrrDrSs r
rHzHTMLParser.parse_pi s����,���q��1��u�~��%�%�%�'F�%�%�%����w��!��,�,��� ��2��K�K�M�M�����w�q��s�A�v��'�'�'��I�I�K�K���rc�V�d|_|�|��}|dkr|S|j}|||�|_g}t�||dz��}|s
Jd���|���}|�d�����x|_}||kr�t�||��}|sn�|�ddd��\} }
}|
sd}nI|dd�dcxkr|dd�ks"n|dd�dcxkr|dd�kr
nn
|dd�}|rt|��}|�| ���|f��|���}||k��|||����}|d vr|�
|||���|S|�d
��r|�||��n4|�||��||jvr|�|��|S)Nrr z#unexpected call to parse_starttag()r7rY�'r8�")r�/>ri)r#�check_for_whole_start_tagr�tagfind_tolerantrDrMrKr&r�attrfind_tolerantr�append�stripr@�endswith�handle_startendtag�handle_starttag�CDATA_CONTENT_ELEMENTSr+)
rrQ�endposr�attrsrDrU�tag�m�attrname�rest� attrvaluerMs
r
rEzHTMLParser.parse_starttag,sj��#����/�/��2�2���A�:�:��M��,��&�q��x�0����� �&�&�w��!��4�4���;�;�;�;�;�;��I�I�K�K��"�[�[��^�^�1�1�3�3�3���s��&�j�j�!�'�'���3�3�A��
��()����1�a�(8�(8�%�H�d�I��
,� � � ��2�A�2��$�8�8�8�8�)�B�C�C�.�8�8�8�8��2�A�2��#�7�7�7�7��2�3�3��7�7�7�7�7�%�a��d�O� ��
0�$�Y�/�/� ��L�L�(�.�.�*�*�I�6�7�7�7������A��&�j�j��a��h��%�%�'�'���k�!�!����W�Q�v�X�.�/�/�/��M��<�<���� )��#�#�C��/�/�/�/�� � ��e�,�,�,��d�1�1�1��#�#�C�(�(�(��
rc��|j}t�||��}|r�|���}|||dz�}|dkr|dzS|dkr@|�d|��r|dzS|�d|��rdS||kr|S|dzS|dkrdS|dvrdS||kr|S|dzStd ���)
Nr r�/rir7r8rz6abcdefghijklmnopqrstuvwxyz=/ABCDEFGHIJKLMNOPQRSTUVWXYZzwe should not get here!)r�locatestarttagend_tolerantrDrMrB�AssertionError)rrQrrvrS�nexts r
rjz$HTMLParser.check_for_whole_start_tagXs���,��&�,�,�W�a�8�8��� ������A��1�Q�q�S�5�>�D��s�{�{��1�u���s�{�{��%�%�d�A�.�.�!��q�5�L��%�%�c�1�-�-���2��q�5�5��H��q�5�L��r�z�z��r��5�6�6��r��1�u�u����1�u���6�7�7�7rc��|j}|||dz�dks
Jd���t�||dz��}|sdS|���}t�||��}|s�|j�|�|||���|St�||dz��}|s+|||dz�dkr|dzS|� |��S|�
d�����}|�d|�����}|�
|��|dzS|�
d�����}|j�*||jkr|�|||���|S|�
|��|���|S) Nr7r3zunexpected call to parse_endtagr r8rYz</>r)r� endendtagr>rM�
endtagfindrDrr@rkr]rKr&r;�
handle_endtagr.)rrQrrDr^� namematch�tagnamer*s r
rFzHTMLParser.parse_endtagzs����,���q��1��u�~��%�%�%�'H�%�%�%�� � ��!�A�#�.�.��� ��2�� � ����� � ��!�,�,��� ���*�� � ���5��!1�2�2�2���(�.�.�w��!��<�<�I��
7��1�Q�q�S�5�>�U�*�*��Q�3�J��3�3�A�6�6�6��o�o�a�(�(�.�.�0�0�G�
�L�L��i�m�m�o�o�6�6�E����w�'�'�'���7�N��{�{�1�~�~�#�#�%�%���?�&��t��&�&�� � ���5��!1�2�2�2������4� � � ��������rc�\�|�||��|�|��dSr-)rqr��rrurts r
rpzHTMLParser.handle_startendtag�s2�����S�%�(�(�(����3�����rc��dSr-�r�s r
rqzHTMLParser.handle_starttag�����rc��dSr-r�)rrus r
r�zHTMLParser.handle_endtag�r�rc��dSr-r��rrVs r
rLzHTMLParser.handle_charref�r�rc��dSr-r�r�s r
rOzHTMLParser.handle_entityref�r�rc��dSr-r�rs r
r@zHTMLParser.handle_data�r�rc��dSr-r�rs r
r`zHTMLParser.handle_comment�r�rc��dSr-r�)r�decls r
r\zHTMLParser.handle_decl�r�rc��dSr-r�rs r
rezHTMLParser.handle_pi�r�rc��dSr-r�rs r
�unknown_declzHTMLParser.unknown_decl�r�r)r )�__name__�
__module__�__qualname__�__doc__rrrrrr!r#r$r+r.rrIr]rHrErjrFrprqr�rLrOr@r`r\rer�r�rr
rr>s���������*1��+/������+�+�+��������O�$�$�$�N�N�N����u#�u#�u#�t/�/�/�* � � � � � � �(�(�(�X8�8�8�D%�%�%�P � � �
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
r)r�r'r�htmlr�__all__r(rrPrNrJrCrd�commentcloserkrl�VERBOSEr|r�r�rrr�rr
�<module>r�sm��"�"�
� � � ������������.�� �R�Z��'�'��
�R�Z��
%�
%�
��B�J�>�?�?� �
�"�*�@�
A�
A���r�z�+�&�&��
�"�*�S�/�/���r�z�)�$�$���2�:�L�M�M���B�J�=�>�>��(�R�Z�)��Z����
�B�J�s�O�O� ��R�Z�>�
?�
?�
�I
�I
�I
�I
�I
��'�I
�I
�I
�I
�I
r |