첨부 실행 코드는 나눔고딕코딩 폰트를 사용합니다.
728x90
반응형
728x170

■ compile 함수에서 ".*"을 사용해 greedy 방식으로 문자열을 구하는 방법을 보여준다.

 

▶ 예제 코드 (PY)

import urllib.request
import re

httpResponse = urllib.request.urlopen("http://www.example.com")

htmlBytes = httpResponse.read()

httpResponse.close()

html = str(htmlBytes).encode("utf-8").decode("cp949")

pattern = re.compile(r"<.*>", re.I | re.S)

list1 = pattern.findall(html)

print(list1)

"""
['<!doctype html>\\n<html>\\n<head>\\n    <title>Example Domain</title>\\n\\n    <meta charset="utf-8" />\\n    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />\\n    <meta name="viewport" content="width=device-width, initial-scale=1" />\\n    <style type="text/css">\\n    body {\\n        background-color: #f0f0f2;\\n        margin: 0;\\n        padding: 0;\\n        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;\\n        \\n    }\\n    div {\\n        width: 600px;\\n        margin: 5em auto;\\n        padding: 2em;\\n        background-color: #fdfdff;\\n        border-radius: 0.5em;\\n        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\\n    }\\n    a:link, a:visited {\\n        color: #38488f;\\n        text-decoration: none;\\n    }\\n    @media (max-width: 700px) {\\n        div {\\n            margin: 0 auto;\\n            width: auto;\\n        }\\n    }\\n    </style>    \\n</head>\\n\\n<body>\\n<div>\\n    <h1>Example Domain</h1>\\n    <p>This domain is for use in illustrative examples in documents. You may use this\\n    domain in literature without prior coordination or asking for permission.</p>\\n    <p><a href="https://www.iana.org/domains/example">More information...</a></p>\\n</div>\\n</body>\\n</html>']
"""
728x90
반응형
그리드형(광고전용)
Posted by icodebroker

댓글을 달아 주세요