FAQs

Cloud Archaeologist

What is a Cloud Archaeologist?

Cloud Archaeology involves extracting data and context out of cloud computing ecosystems for asset management and providing historical relevance for cost management, security and compliance related use cases. While the micro-epochs of the “devops” movement cannot quite be considered “prehistory”, the layers of fossilization produced by rapidly shifting tech stacks, attrition, forgetfulness, or laziness are financial, compliance and security pain waiting to surface.

At Cloud Archaeologist, we build tools that reduce the complexity and tediousness of accessing cloud data sources.

BOYD

What is BOYD?

BOYD is a software application that runs on your local computer and provides a web interface for downloading selected columns, classifying and analyzing AWS Cost and Usage Reports (CUR) data from S3 parquet files.

How does BOYD work?

BOYD uses local AWS command line session profiles to access CUR parquet files in S3. Minimal BOYD settings require a CUR bucket, the region where the bucket is deployed and an AWS command line profile that has access to the bucket.

Once configured, BOYD will identify the CUR report version (1.0, 2.0 or FOCUS) and the columns available in your dataset. There may be hundreds of columns and you will likely only need a fraction of them depending on your use case. Select the schema columns of interest and if you are not sure about a column’s contents, the S3 CUR Explorer allows you to browse summary statistics of columns directly from your reports in S3 to preview values.

CUR reports are typically massive in width and length: lots of columns and lots of rows. Schema selection helps reduce the width while the date columns (_end_date/_start_date/PeriodStart/PeriodEnd) will impact the length. For example, if you only need monthly reporting visibility and use other means for your daily trend visibility, only including bill_billing_period_start_date/BillPeriodStart and excluding the other _start_date/_end_date/PeriodStart/PeriodEnd columns, the size of the BOYD collected dataset will be reduced. Not surprisingly, including line_item_usage_start_date/ChargePeriodStart on an hourly CUR report will dramatically increase the size of your collected dataset.

With your schema selected, return to the home page, select a billing period and BOYD will collect the data from S3. From there you can start to explore your inventory, identify classification strategies to apply more context to your data, and explore the reporting and analysis capabilities.

What is needed to start a trial?

BOYD has an initial 14 day free trial period to allow users to assess how well it works with their dataset. Prior to initiating the free trial you should ensure you have a CUR report configured that meets the requirements and that you have AWS command line access and an AWS command line interface profile configured to access your CUR report bucket and files. For BOYD CUR collection you will need to know your CUR bucket, the region where the bucket is deployed and your AWS command line profile that has access to the bucket. For utilizing Dig Mode, you will need access to the read permissions in the default AWS “SecurityAudit” role.

Supported Operating Systems

  • MacOS supported
  • (Windows support in development)

What are the BOYD AWS Cost and Usage Report configuration requirements?

  • AWS CUR reporting with:
    • Include resource IDs enabled
    • Data export delivery options: Parquet format;
    • Report Versioning: Overwrite existing data export file;
    • Daily reporting is recommended; hourly reporting is supported but will result in much larger datasets and reduce performance.
  • Recommended less than 1GB/month total CUR parquet size, but mileage will vary depending on the schema columns collected, cloud services/architectures in use and local system capabilities;
  • AWS command line access to your Cost and Usage Report (CUR) S3 bucket; see “What AWS permissions are required?” for details

What AWS permissions are required?

  • BOYD requires AWS permissions to Get/List CUR S3 bucket and files. Dig Mode, which can be used to investigate and apply discovered context to your results, requires additional permissions to AWS Get/Describe APIs, AWS Cloudtrail events API, and AWS Config API. The AWS managed “SecurityAuditRole” role should cover the necessary Dig Mode permissions.
  • An example AWS IAM policy showing the minimum permissions required for CUR access (does not include “SecurityAudit” role permissions):
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetBucketAcl",
                "s3:GetBucketPolicy"
            ],
            "Resource": [
                "arn:aws:s3:::<YOUR_CUR_BUCKET>",
            ]
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectAcl",
                "s3:GetObjectAttributes"
            ],
            "Resource": [
                "arn:aws:s3:::<YOUR_CUR_BUCKET>/*"
            ]
        }
    ]
}

What APIs does Dig Mode support?

Dig Mode has been tested against the following services and APIs. Cloudtrail API requests attempt lookups using the ARN and/or resource ID value. (Note: GovCloud regions are not supported in BOYD v1.0.1.)

Dig Mode Supported APIs
Product CodeConfigAPI
AWS Amplify – appget_app
AWS Certificate Manager – acmdescribe_certificate
AWS Certificate Manager – acm_pcadescribe_certificate_authority
AWS CloudFormation – stackdescribe_stacks
Amazon CloudFront – distributionget_distribution
Amazon CloudFront – functionget_function
AmazonCloudWatch – flow-logdescribe_flow_logs
AmazonCloudWatch – instancedescribe_instances
AmazonCloudWatch – logsdescribe_log_groups
CodeBuild – projectbatch_get_projects
AWS CodePipeline – codepipelineget_pipeline
Amazon DynamoDB – tabledescribe_table
Amazon Elastic Compute Cloud – instancedescribe_instances
Amazon Elastic Compute Cloud – natdescribe_nat_gateways
Amazon Elastic Compute Cloud – snapshotdescribe_snapshots
Amazon Elastic Compute Cloud – volumedescribe_volumes
Amazon EC2 Container Registry (ECR) – repositorydescribe_repositories
Amazon Elastic Container Registry Public – public-repositorydescribe_tasks
Amazon EFS – file-systemdescribe_file_systems
Amazon ES – es-domaindescribe_elasticsearch_domain
Amazon ElastiCache – cache-clusterdescribe_cache_clusters
AWS Glue – crawlerget_crawler
AWS Glue – databaseget_database
AWS Glue – table
AWS Glue – jobget_job
Amazon Inspector – instancelist_findings
Amazon Inspector – lambdalist_findings
Amazon Kinesis – kinesisdescribe_stream
Amazon Kinesis Firehose – firehosedescribe_delivery_stream
AWS Key Management Service – keydescribe_key
Amazon Neptune – neptune-dbdescribe_db_instances
Amazon SageMaker – notebook-instancedescribe_notebook_instance
Amazon Virtual Private Cloud – client-vpn-enpdointdescribe_client_vpn_endpoints
Amazon Virtual Private Cloud – eipdescribe_addresses
Amazon Virtual Private Cloud – network_interfacedescribe_network_interfaces
Amazon Virtual Private Cloud – transit-gateway-attachmentdescribe_transit_gateway_attachments
Amazon Virtual Private Cloud – vpc-endpointdescribe_vpc_endpoints
Elastic Load Balancing – loadbalancerv2_appdescribe_load_balancers
Elastic Load Balancing – loadbalancerv2_netdescribe_load_balancers
Elastic Load Balancing – loadbalancerdescribe_load_balancers
AWS Lambda – functionget_function
Amazon QuickSight – quicksight-userdescribe_user
Amazon RDS – dbdescribe_db_instances
Amazon RDS – clusterdescribe_db_clusters
Amazon RDS – cluster-snapshotdescribe_db_cluster_snapshots
Amazon Route 53 – healthcheckget_health_check
Amazon Route 53 – hostedzoneget_hosted_zone
Amazon Simple Storage Service – bucket
AWS Secrets Manager – secretdescribe_secret
Amazon Simple Notification Service
AWS WAF – wafv1-webacl
AWS WAF – wafv1-webacl+rule
AWS WAF – wafv1-regional-webacl
AWS WAF – wafv1-regional-webacl+rule
AWS WAF – wafv2-regional-webacl
AWS WAF – wafv2-global-webacl