API reference

exception pylookyloo.AuthError
class pylookyloo.CaptureSettings(**kwargs)

The capture settings that can be passed to Lookyloo.

class pylookyloo.CompareSettings(**kwargs)

The settings that can be passed to the compare method on lookyloo side to filter out some differences

exception pylookyloo.PyLookylooError

Lookyloo

class pylookyloo.Lookyloo(root_url: str | None = None, useragent: str | None = None, *, proxies: dict[str, str] | None = None, verify: bool | str = True)
ai_export(tree_uuid: str) dict[str, Any]

Export the capture in a format you can shove in a model

compare_captures(capture_left: str, capture_right: str, /, *, compare_settings: CompareSettings | dict[str, Any] | None = None) dict[str, Any]

Compares two captures

Parameters:
  • capture_left – UUID of the capture to compare from

  • capture_right – UUID of the capture to compare to

  • compare_settings – The settings for the comparison itself (what to ignore without marking the captures as different)

enqueue(url: str | None = None, quiet: bool = False, document: Path | BytesIO | None = None, document_name: str | None = None, **kwargs) str

Enqueue an URL.

Parameters:
  • url – URL to enqueue

  • quiet – Returns the UUID only, instead of the whole URL

  • document – A document to submit to Lookyloo. It can be anything suported by a browser.

  • document_name – The name of the document (only if you passed a pseudofile).

  • kwargs – accepts all the parameters supported by Lookyloo.capture

get_apikey(username: str, password: str) dict[str, str]

Get the API key for the given user.

get_capture_stats(tree_uuid: str) dict[str, Any]

Get statistics of the capture

get_categories_captures(category: str | None = None) list[str] | dict[str, list[str]] | None

Get uuids for a specific category or all categorized uuids if category is None

Parameters:

category – The category according to which the uuids are to be returned

get_comparables(tree_uuid: str) dict[str, Any]

Get comparable information from the capture

get_complete_capture(capture_uuid: str) BytesIO

Returns a zip files that contains the screenshot, the har, the rendered HTML, and the cookies.

Parameters:

capture_uuid – UUID of the capture

get_cookies(capture_uuid: str) list[dict[str, str]]

Returns the complete cookies jar.

Parameters:

capture_uuid – UUID of the capture

get_data(capture_uuid: str) BytesIO

Returns the downloaded data.

Parameters:

capture_uuid – UUID of the capture

get_favicon_occurrences(favicon: str | BytesIO, *, cached_captures_only: bool = True, limit: int = 20, offset: int = 0) dict[str, Any]

Returns all the captures containing the favicon.

Parameters:
  • favicon – Favicon to lookup. Either the hash, or the file in a BytesIO (hash will be generated on the fly)

  • cached_captures_only – If False, Lookyloo will attempt to re-cache the missing captures. It might take some time.

  • limit – The max amount of entries to return.

  • offset – The offset to start from, useful for pagination.

get_favicons(capture_uuid: str) dict[str, Any]

Returns the potential favicons of the capture.

Parameters:

capture_uuid – UUID of the capture

get_hash_occurrences(h: str, *, with_urls_occurrences: bool = False, cached_captures_only: bool = True, limit: int = 20, offset: int = 0) dict[str, Any]

Returns the base64 body related the the hash, and a list of all the captures containing that hash.

Parameters:
  • h – sha512 to search

  • with_urls_occurrences – If true, add details about the URLs from the URL nodes in the tree.

  • cached_captures_only – If False, Lookyloo will attempt to re-cache the missing captures. It might take some time.

  • limit – The max amount of entries to return.

  • offset – The offset to start from, useful for pagination.

get_hashes(capture_uuid: str, algorithm: str = 'sha512', hashes_only: bool = True) StringIO

Returns all the hashes of all the bodies (including the embedded contents)

Parameters:
  • capture_uuid – UUID of the capture

  • algorithm – The algorithm of the hashes

  • hashes_only – If False, will also return the URLs related to the hashes

get_hostname_occurrences(hostname: str, *, with_urls_occurrences: bool = False, cached_captures_only: bool = True, limit: int = 20, offset: int = 0) dict[str, Any]

Returns all the captures contining the hostname.

Parameters:
  • hostname – Hostname to lookup

  • with_urls_occurrences – If true, add details about the related URLs.

  • cached_captures_only – If False, Lookyloo will attempt to re-cache the missing captures. It might take some time.

  • limit – The max amount of entries to return.

  • offset – The offset to start from, useful for pagination.

get_hostnames(capture_uuid: str) dict[str, Any]

Returns all the hostnames seen during the capture.

Parameters:

capture_uuid – UUID of the capture

get_html(capture_uuid: str) StringIO

Returns the rendered HTML as it is in the browser after the page loaded.

Parameters:

capture_uuid – UUID of the capture

get_html_as_markdown(capture_uuid: str) StringIO

Returns the rendered HTML as it is in the browser after the page loaded, and convert it to markdown.

Parameters:

capture_uuid – UUID of the capture

get_info(tree_uuid: str) dict[str, Any]

Get information about the capture (url, timestamp, user agent)

get_ip_occurrences(ip: str, *, with_urls_occurrences: bool = False, cached_captures_only: bool = True, limit: int = 20, offset: int = 0) dict[str, Any]

Returns all the captures containing the IP address.

Parameters:
  • ip – IP to lookup

  • with_urls_occurrences – If true, add details about the related URLs.

  • cached_captures_only – If False, Lookyloo will attempt to re-cache the missing captures. It might take some time.

  • limit – The max amount of entries to return.

  • offset – The offset to start from, useful for pagination.

get_ips(capture_uuid: str) dict[str, Any]

Returns all the IPs seen during the capture.

Parameters:

capture_uuid – UUID of the capture

get_modules_responses(tree_uuid: str) dict[str, Any]

Returns information from the 3rd party modules

Parameters:

capture_uuid – UUID of the capture

get_recent_captures(timestamp: str | datetime | float | None = None) list[str]

Gets the uuids of the most recent captures

Parameters:

timestamp – Oldest timestamp to consider

get_redirects(capture_uuid: str) dict[str, Any]

Returns the initial redirects.

Parameters:

capture_uuid – UUID of the capture

get_remote_lacuses() list[dict[str, Any]]

Get the list of Lacus instances configured on the Lookyloo instance

get_screenshot(capture_uuid: str) BytesIO

Returns the screenshot.

Parameters:

capture_uuid – UUID of the capture

get_stats() dict[str, Any]

Returns all the captures contining the URL

get_status(tree_uuid: str) dict[str, Any]

Get the status of a capture: * -1: Unknown capture. * 0: The capture is queued up but not processed yet. * 1: The capture is ready. * 2: The capture is ongoing and will be ready soon.

get_storage(capture_uuid: str) dict[str, Any]

Returns the complete storage state.

Parameters:

capture_uuid – UUID of the capture

get_takedown_information(capture_uuid: str, filter_contacts: Literal[True]) list[str]
get_takedown_information(capture_uuid: str, filter_contacts: Literal[False] = False) list[dict[str, Any]]

Returns information required to request a takedown for a capture

Parameters:
  • capture_uuid – UUID of the capture

  • filter_contacts – If True, will only return the contact emails and filter out the invalid ones.

get_url_occurrences(url: str, *, with_urls_occurrences: bool = False, cached_captures_only: bool = True, limit: int = 20, offset: int = 0) dict[str, Any]

Returns all the captures contining the URL

Parameters:
  • url – URL to lookup

  • with_urls_occurrences – If true, add details about the URLs from the URL nodes in the tree.

  • cached_captures_only – If False, Lookyloo will attempt to re-cache the missing captures. It might take some time.

  • limit – The max amount of entries to return.

  • offset – The offset to start from, useful for pagination.

get_urls(capture_uuid: str) dict[str, Any]

Returns all the URLs seen during the capture.

Parameters:

capture_uuid – UUID of the capture

get_user_config() dict[str, Any] | None

Get the configuration enforced by the server for the current user (requires an authenticated user, use init_apikey first)

hide_capture(tree_uuid: str) dict[str, str]

Hide a capture from the index page (requires an authenticated user, use init_apikey first)

init_apikey(username: str | None = None, password: str | None = None, apikey: str | None = None) None

Init the API key for the current session. All the requests against lookyloo after this call will be authenticated.

property is_up: bool

Test if the given instance is accessible

misp_export(tree_uuid: str) dict[str, Any]

Export the capture in MISP format

misp_push(tree_uuid: str) dict[str, Any] | list[dict[str, Any]]

Push the capture to a pre-configured MISP instance (requires an authenticated user, use init_apikey first) Note: if the response is a dict, it is an error mesage. If it is a list, it’s a list of MISP event.

push_from_lacus(capture: dict[str, Any]) dict[str, Any]

Push a capture from Lacus to Lookyloo

Parameters:

capture – The capture to push from Lacus

rebuild_capture(tree_uuid: str) dict[str, str]

Force rebuild a capture (requires an authenticated user, use init_apikey first)

remove_capture(tree_uuid: str) dict[str, str]

Remove a capture, it will be impossible to get it by UUID (requires an authenticated user, use init_apikey first)

send_mail(tree_uuid: str, email: str = '', comment: str | None = None) bool | dict[str, Any]

Reports a capture by sending an email to the investigation team

Parameters:
  • tree_uuid – UUID of the capture

  • email – Email of the reporter, used by the analyst to get in touch

  • comment – Description of the URL, will be given to the analyst

submit(*, quiet: bool = False, capture_settings: LookylooCaptureSettings | dict[str, Any] | None = None) str
submit(*, quiet: bool = False, url: str | None = None, document_name: str | None = None, document: Path | BytesIO | None = None, browser: Literal['chromium', 'firefox', 'webkit'] | None = None, device_name: str | None = None, user_agent: str | None = None, proxy: str | dict[str, str] | None = None, general_timeout_in_sec: int | None = None, cookies: list[dict[str, Any]] | list[Cookie] | None = None, storage: str | dict[str, Any] | None = None, headers: str | dict[str, str] | None = None, http_credentials: dict[str, str] | HttpCredentialsSettings | None = None, geolocation: dict[str, str | int | float] | GeolocationSettings | None = None, timezone_id: str | None = None, locale: str | None = None, color_scheme: Literal['dark', 'light', 'no-preference', 'null'] | None = None, java_script_enabled: bool = True, viewport: dict[str, str | int] | ViewportSettings | None = None, referer: str | None = None, with_screenshot: bool = True, with_favicon: bool = True, allow_tracking: bool = False, headless: bool = True, init_script: str | None = None, with_trusted_timestamps: bool = False, final_wait: int = 1, listing: bool = False, auto_report: bool | dict[str, str] | None = None, remote_lacus_name: str | None = None, categories: list[str] | None = None, monitor_capture: dict[str, str | bool] | None = None) str

Submit a URL to a lookyloo instance.

Parameters:
  • quiet – Returns the UUID only, instead of the whole URL

  • capture_settings – Settings as a dictionary. It overwrites all other parmeters.

  • url – URL to capture (incompatible with document and document_name)

  • document_name – Filename of the document to capture (required if document is used)

  • document – Document to capture itself (requires a document_name)

  • browser – The browser to use for the capture, must be something Playwright knows

  • device_name – The name of the device, must be something Playwright knows

  • user_agent – The user agent the browser will use for the capture

  • proxy – Capture via a proxy. It can either be the full URL to a SOCKS5 proxy, or the name of a specific proxy configured on a remote lacus instance.

  • general_timeout_in_sec – The capture will raise a timeout it it takes more than that time

  • cookies – A list of cookies

  • storage – The storage as exported from another capture. Can contain the IndexedDB.

  • headers – The headers to pass to the capture

  • http_credentials – HTTP Credentials to pass to the capture

  • geolocation – The geolocation of the browser latitude/longitude

  • timezone_id – The timezone, warning, it m ust be a valid timezone (continent/city)

  • locale – The locale of the browser

  • color_scheme – The prefered color scheme of the browser (light or dark)

  • java_script_enabled – If False, no JS will run during the capture.

  • viewport – The viewport of the browser used for capturing

  • referer – The referer URL for the capture

  • with_screenshot – Is False, do not take a screenshot at the end of the capture

  • with_favicon – If False, do not try to find favicons in the rendered page

  • allow_tracking – If True, attempt to find the overlay asking for the permission to track you and allow everything (best effort, please get in touch if needed)

  • headless – If False, the browser will be headed, it requires the capture to be done on a desktop.

  • init_script – JavaScript code to inject in the rendered page, before the page starts loading.

  • with_trusted_timestamps – If True, and a trusted timestamp provider is configured, trigger a request for trusted timestamps for forensic archival.

  • final_wait – The wait time after the instrumentaiton if over. The capture finishes immediately after that wait time.

  • listing – If False, the capture will be not be on the publicly accessible index page of lookyloo

  • auto_report

    If set, the capture will automatically be forwarded to an analyst (if the instance is configured this way) Pass True if you want to autoreport without any setting, or a dictionary with two keys:

    • email (required): the email of the submitter, so the analyst to get in touch

    • comment (optional): a comment about the capture to help the analyst

  • remote_lacus_name – The name of the remote Lacus instance to use for the capture (only if lookyloo is configured this way)

  • categories – (v1.37.0+) A list of categories to assign to the capture

  • monitor_capture – (v1.38.0+) The settings to pass to the monitoring interface. The only required key is “frequency” (hourly/daily).

trigger_modules(tree_uuid: str, force: bool = False) dict[str, Any]

Trigger all the available 3rd party modules on the given capture. :param force: Trigger the modules even if they were already triggered today.

upload_capture(*, quiet: Literal[True], listing: bool = False, full_capture: Path | BytesIO | str | None = None, har: Path | BytesIO | str | None = None, html: Path | BytesIO | str | None = None, last_redirected_url: str | None = None, screenshot: Path | BytesIO | str | None = None, categories: list[str] | None = None) str
upload_capture(*, quiet: Literal[False] = False, listing: bool = False, full_capture: Path | BytesIO | str | None = None, har: Path | BytesIO | str | None = None, html: Path | BytesIO | str | None = None, last_redirected_url: str | None = None, screenshot: Path | BytesIO | str | None = None, categories: list[str] | None = None) tuple[str, dict[str, str]]

Upload a capture via har-file and others

Parameters:
  • quiet – Returns the UUID only, instead of the the UUID and the potential error / warning messages

  • listing – if true the capture should be public, else private - overwritten if the full_capture is given and it contains no_index

  • full_capture – path to the capture made by another instance

  • har – Harfile of the capture

  • html – rendered HTML of the capture

  • last_redirected_url – The landing page of the capture

  • screenshot – Screenshot of the capture

  • categories – The categories assigned to the capture