API reference¶
- exception pylookyloo.AuthError¶
- class pylookyloo.CaptureSettings(**kwargs)¶
The capture settings that can be passed to Lookyloo.
- class pylookyloo.CompareSettings(**kwargs)¶
The settings that can be passed to the compare method on lookyloo side to filter out some differences
- exception pylookyloo.PyLookylooError¶
Lookyloo¶
- class pylookyloo.Lookyloo(root_url: str | None = None, useragent: str | None = None, *, proxies: dict[str, str] | None = None, verify: bool | str = True)¶
- ai_export(tree_uuid: str) dict[str, Any]¶
Export the capture in a format you can shove in a model
- compare_captures(capture_left: str, capture_right: str, /, *, compare_settings: CompareSettings | dict[str, Any] | None = None) dict[str, Any]¶
Compares two captures
- Parameters:
capture_left – UUID of the capture to compare from
capture_right – UUID of the capture to compare to
compare_settings – The settings for the comparison itself (what to ignore without marking the captures as different)
- enqueue(url: str | None = None, quiet: bool = False, document: Path | BytesIO | None = None, document_name: str | None = None, **kwargs) str¶
Enqueue an URL.
- Parameters:
url – URL to enqueue
quiet – Returns the UUID only, instead of the whole URL
document – A document to submit to Lookyloo. It can be anything suported by a browser.
document_name – The name of the document (only if you passed a pseudofile).
kwargs – accepts all the parameters supported by Lookyloo.capture
- get_apikey(username: str, password: str) dict[str, str]¶
Get the API key for the given user.
- get_capture_stats(tree_uuid: str) dict[str, Any]¶
Get statistics of the capture
- get_categories_captures(category: str | None = None) list[str] | dict[str, list[str]] | None¶
Get uuids for a specific category or all categorized uuids if category is None
- Parameters:
category – The category according to which the uuids are to be returned
- get_comparables(tree_uuid: str) dict[str, Any]¶
Get comparable information from the capture
- get_complete_capture(capture_uuid: str) BytesIO¶
Returns a zip files that contains the screenshot, the har, the rendered HTML, and the cookies.
- Parameters:
capture_uuid – UUID of the capture
- get_cookies(capture_uuid: str) list[dict[str, str]]¶
Returns the complete cookies jar.
- Parameters:
capture_uuid – UUID of the capture
- get_data(capture_uuid: str) BytesIO¶
Returns the downloaded data.
- Parameters:
capture_uuid – UUID of the capture
- get_favicon_occurrences(favicon: str | BytesIO, *, cached_captures_only: bool = True, limit: int = 20, offset: int = 0) dict[str, Any]¶
Returns all the captures containing the favicon.
- Parameters:
favicon – Favicon to lookup. Either the hash, or the file in a BytesIO (hash will be generated on the fly)
cached_captures_only – If False, Lookyloo will attempt to re-cache the missing captures. It might take some time.
limit – The max amount of entries to return.
offset – The offset to start from, useful for pagination.
- get_favicons(capture_uuid: str) dict[str, Any]¶
Returns the potential favicons of the capture.
- Parameters:
capture_uuid – UUID of the capture
- get_hash_occurrences(h: str, *, with_urls_occurrences: bool = False, cached_captures_only: bool = True, limit: int = 20, offset: int = 0) dict[str, Any]¶
Returns the base64 body related the the hash, and a list of all the captures containing that hash.
- Parameters:
h – sha512 to search
with_urls_occurrences – If true, add details about the URLs from the URL nodes in the tree.
cached_captures_only – If False, Lookyloo will attempt to re-cache the missing captures. It might take some time.
limit – The max amount of entries to return.
offset – The offset to start from, useful for pagination.
- get_hashes(capture_uuid: str, algorithm: str = 'sha512', hashes_only: bool = True) StringIO¶
Returns all the hashes of all the bodies (including the embedded contents)
- Parameters:
capture_uuid – UUID of the capture
algorithm – The algorithm of the hashes
hashes_only – If False, will also return the URLs related to the hashes
- get_hostname_occurrences(hostname: str, *, with_urls_occurrences: bool = False, cached_captures_only: bool = True, limit: int = 20, offset: int = 0) dict[str, Any]¶
Returns all the captures contining the hostname.
- Parameters:
hostname – Hostname to lookup
with_urls_occurrences – If true, add details about the related URLs.
cached_captures_only – If False, Lookyloo will attempt to re-cache the missing captures. It might take some time.
limit – The max amount of entries to return.
offset – The offset to start from, useful for pagination.
- get_hostnames(capture_uuid: str) dict[str, Any]¶
Returns all the hostnames seen during the capture.
- Parameters:
capture_uuid – UUID of the capture
- get_html(capture_uuid: str) StringIO¶
Returns the rendered HTML as it is in the browser after the page loaded.
- Parameters:
capture_uuid – UUID of the capture
- get_html_as_markdown(capture_uuid: str) StringIO¶
Returns the rendered HTML as it is in the browser after the page loaded, and convert it to markdown.
- Parameters:
capture_uuid – UUID of the capture
- get_info(tree_uuid: str) dict[str, Any]¶
Get information about the capture (url, timestamp, user agent)
- get_ip_occurrences(ip: str, *, with_urls_occurrences: bool = False, cached_captures_only: bool = True, limit: int = 20, offset: int = 0) dict[str, Any]¶
Returns all the captures containing the IP address.
- Parameters:
ip – IP to lookup
with_urls_occurrences – If true, add details about the related URLs.
cached_captures_only – If False, Lookyloo will attempt to re-cache the missing captures. It might take some time.
limit – The max amount of entries to return.
offset – The offset to start from, useful for pagination.
- get_ips(capture_uuid: str) dict[str, Any]¶
Returns all the IPs seen during the capture.
- Parameters:
capture_uuid – UUID of the capture
- get_modules_responses(tree_uuid: str) dict[str, Any]¶
Returns information from the 3rd party modules
- Parameters:
capture_uuid – UUID of the capture
- get_recent_captures(timestamp: str | datetime | float | None = None) list[str]¶
Gets the uuids of the most recent captures
- Parameters:
timestamp – Oldest timestamp to consider
- get_redirects(capture_uuid: str) dict[str, Any]¶
Returns the initial redirects.
- Parameters:
capture_uuid – UUID of the capture
- get_remote_lacuses() list[dict[str, Any]]¶
Get the list of Lacus instances configured on the Lookyloo instance
- get_screenshot(capture_uuid: str) BytesIO¶
Returns the screenshot.
- Parameters:
capture_uuid – UUID of the capture
- get_stats() dict[str, Any]¶
Returns all the captures contining the URL
- get_status(tree_uuid: str) dict[str, Any]¶
Get the status of a capture: * -1: Unknown capture. * 0: The capture is queued up but not processed yet. * 1: The capture is ready. * 2: The capture is ongoing and will be ready soon.
- get_storage(capture_uuid: str) dict[str, Any]¶
Returns the complete storage state.
- Parameters:
capture_uuid – UUID of the capture
- get_takedown_information(capture_uuid: str, filter_contacts: Literal[True]) list[str]¶
- get_takedown_information(capture_uuid: str, filter_contacts: Literal[False] = False) list[dict[str, Any]]
Returns information required to request a takedown for a capture
- Parameters:
capture_uuid – UUID of the capture
filter_contacts – If True, will only return the contact emails and filter out the invalid ones.
- get_url_occurrences(url: str, *, with_urls_occurrences: bool = False, cached_captures_only: bool = True, limit: int = 20, offset: int = 0) dict[str, Any]¶
Returns all the captures contining the URL
- Parameters:
url – URL to lookup
with_urls_occurrences – If true, add details about the URLs from the URL nodes in the tree.
cached_captures_only – If False, Lookyloo will attempt to re-cache the missing captures. It might take some time.
limit – The max amount of entries to return.
offset – The offset to start from, useful for pagination.
- get_urls(capture_uuid: str) dict[str, Any]¶
Returns all the URLs seen during the capture.
- Parameters:
capture_uuid – UUID of the capture
- get_user_config() dict[str, Any] | None¶
Get the configuration enforced by the server for the current user (requires an authenticated user, use init_apikey first)
- hide_capture(tree_uuid: str) dict[str, str]¶
Hide a capture from the index page (requires an authenticated user, use init_apikey first)
- init_apikey(username: str | None = None, password: str | None = None, apikey: str | None = None) None¶
Init the API key for the current session. All the requests against lookyloo after this call will be authenticated.
- property is_up: bool¶
Test if the given instance is accessible
- misp_export(tree_uuid: str) dict[str, Any]¶
Export the capture in MISP format
- misp_push(tree_uuid: str) dict[str, Any] | list[dict[str, Any]]¶
Push the capture to a pre-configured MISP instance (requires an authenticated user, use init_apikey first) Note: if the response is a dict, it is an error mesage. If it is a list, it’s a list of MISP event.
- push_from_lacus(capture: dict[str, Any]) dict[str, Any]¶
Push a capture from Lacus to Lookyloo
- Parameters:
capture – The capture to push from Lacus
- rebuild_capture(tree_uuid: str) dict[str, str]¶
Force rebuild a capture (requires an authenticated user, use init_apikey first)
- remove_capture(tree_uuid: str) dict[str, str]¶
Remove a capture, it will be impossible to get it by UUID (requires an authenticated user, use init_apikey first)
- send_mail(tree_uuid: str, email: str = '', comment: str | None = None) bool | dict[str, Any]¶
Reports a capture by sending an email to the investigation team
- Parameters:
tree_uuid – UUID of the capture
email – Email of the reporter, used by the analyst to get in touch
comment – Description of the URL, will be given to the analyst
- submit(*, quiet: bool = False, capture_settings: LookylooCaptureSettings | dict[str, Any] | None = None) str¶
- submit(*, quiet: bool = False, url: str | None = None, document_name: str | None = None, document: Path | BytesIO | None = None, browser: Literal['chromium', 'firefox', 'webkit'] | None = None, device_name: str | None = None, user_agent: str | None = None, proxy: str | dict[str, str] | None = None, general_timeout_in_sec: int | None = None, cookies: list[dict[str, Any]] | list[Cookie] | None = None, storage: str | dict[str, Any] | None = None, headers: str | dict[str, str] | None = None, http_credentials: dict[str, str] | HttpCredentialsSettings | None = None, geolocation: dict[str, str | int | float] | GeolocationSettings | None = None, timezone_id: str | None = None, locale: str | None = None, color_scheme: Literal['dark', 'light', 'no-preference', 'null'] | None = None, java_script_enabled: bool = True, viewport: dict[str, str | int] | ViewportSettings | None = None, referer: str | None = None, with_screenshot: bool = True, with_favicon: bool = True, allow_tracking: bool = False, headless: bool = True, init_script: str | None = None, with_trusted_timestamps: bool = False, final_wait: int = 1, listing: bool = False, auto_report: bool | dict[str, str] | None = None, remote_lacus_name: str | None = None, categories: list[str] | None = None, monitor_capture: dict[str, str | bool] | None = None) str
Submit a URL to a lookyloo instance.
- Parameters:
quiet – Returns the UUID only, instead of the whole URL
capture_settings – Settings as a dictionary. It overwrites all other parmeters.
url – URL to capture (incompatible with document and document_name)
document_name – Filename of the document to capture (required if document is used)
document – Document to capture itself (requires a document_name)
browser – The browser to use for the capture, must be something Playwright knows
device_name – The name of the device, must be something Playwright knows
user_agent – The user agent the browser will use for the capture
proxy – Capture via a proxy. It can either be the full URL to a SOCKS5 proxy, or the name of a specific proxy configured on a remote lacus instance.
general_timeout_in_sec – The capture will raise a timeout it it takes more than that time
cookies – A list of cookies
storage – The storage as exported from another capture. Can contain the IndexedDB.
headers – The headers to pass to the capture
http_credentials – HTTP Credentials to pass to the capture
geolocation – The geolocation of the browser latitude/longitude
timezone_id – The timezone, warning, it m ust be a valid timezone (continent/city)
locale – The locale of the browser
color_scheme – The prefered color scheme of the browser (light or dark)
java_script_enabled – If False, no JS will run during the capture.
viewport – The viewport of the browser used for capturing
referer – The referer URL for the capture
with_screenshot – Is False, do not take a screenshot at the end of the capture
with_favicon – If False, do not try to find favicons in the rendered page
allow_tracking – If True, attempt to find the overlay asking for the permission to track you and allow everything (best effort, please get in touch if needed)
headless – If False, the browser will be headed, it requires the capture to be done on a desktop.
init_script – JavaScript code to inject in the rendered page, before the page starts loading.
with_trusted_timestamps – If True, and a trusted timestamp provider is configured, trigger a request for trusted timestamps for forensic archival.
final_wait – The wait time after the instrumentaiton if over. The capture finishes immediately after that wait time.
listing – If False, the capture will be not be on the publicly accessible index page of lookyloo
auto_report –
If set, the capture will automatically be forwarded to an analyst (if the instance is configured this way) Pass True if you want to autoreport without any setting, or a dictionary with two keys:
email (required): the email of the submitter, so the analyst to get in touch
comment (optional): a comment about the capture to help the analyst
remote_lacus_name – The name of the remote Lacus instance to use for the capture (only if lookyloo is configured this way)
categories – (v1.37.0+) A list of categories to assign to the capture
monitor_capture – (v1.38.0+) The settings to pass to the monitoring interface. The only required key is “frequency” (hourly/daily).
- trigger_modules(tree_uuid: str, force: bool = False) dict[str, Any]¶
Trigger all the available 3rd party modules on the given capture. :param force: Trigger the modules even if they were already triggered today.
- upload_capture(*, quiet: Literal[True], listing: bool = False, full_capture: Path | BytesIO | str | None = None, har: Path | BytesIO | str | None = None, html: Path | BytesIO | str | None = None, last_redirected_url: str | None = None, screenshot: Path | BytesIO | str | None = None, categories: list[str] | None = None) str¶
- upload_capture(*, quiet: Literal[False] = False, listing: bool = False, full_capture: Path | BytesIO | str | None = None, har: Path | BytesIO | str | None = None, html: Path | BytesIO | str | None = None, last_redirected_url: str | None = None, screenshot: Path | BytesIO | str | None = None, categories: list[str] | None = None) tuple[str, dict[str, str]]
Upload a capture via har-file and others
- Parameters:
quiet – Returns the UUID only, instead of the the UUID and the potential error / warning messages
listing – if true the capture should be public, else private - overwritten if the full_capture is given and it contains no_index
full_capture – path to the capture made by another instance
har – Harfile of the capture
html – rendered HTML of the capture
last_redirected_url – The landing page of the capture
screenshot – Screenshot of the capture
categories – The categories assigned to the capture