FAQs
Cloud Archaeologist
What is a Cloud Archaeologist?

Cloud Archaeology involves extracting data and context out of cloud computing ecosystems for asset management and providing historical relevance for cost management, security and compliance related use cases. While the micro-epochs of the “devops” movement cannot quite be considered “prehistory”, the layers of fossilization produced by rapidly shifting tech stacks, attrition, forgetfulness, or laziness are financial, compliance and security pain waiting to surface.
At Cloud Archaeologist, we build tools that reduce the complexity and tediousness of accessing cloud data sources.
BOYD
What is BOYD?
BOYD is a software application that runs on your local computer and provides a web interface for downloading selected columns, classifying and analyzing AWS Cost and Usage Reports (CUR) data from S3 parquet files.
How does BOYD work?
BOYD uses local AWS command line session profiles to access CUR parquet files in S3. Minimal BOYD settings require a CUR bucket, the region where the bucket is deployed and an AWS command line profile that has access to the bucket.
Once configured, BOYD will identify the CUR report version (1.0, 2.0 or FOCUS) and the columns available in your dataset. There may be hundreds of columns and you will likely only need a fraction of them depending on your use case. Select the schema columns of interest and if you are not sure about a column’s contents, the S3 CUR Explorer allows you to browse summary statistics of columns directly from your reports in S3 to preview values.
CUR reports are typically massive in width and length: lots of columns and lots of rows. Schema selection helps reduce the width while the date columns (_end_date/_start_date/PeriodStart/PeriodEnd) will impact the length. For example, if you only need monthly reporting visibility and use other means for your daily trend visibility, only including bill_billing_period_start_date/BillPeriodStart and excluding the other _start_date/_end_date/PeriodStart/PeriodEnd columns, the size of the BOYD collected dataset will be reduced. Not surprisingly, including line_item_usage_start_date/ChargePeriodStart on an hourly CUR report will dramatically increase the size of your collected dataset.
With your schema selected, return to the home page, select a billing period and BOYD will collect the data from S3. From there you can start to explore your inventory, identify classification strategies to apply more context to your data, and explore the reporting and analysis capabilities.
What is needed to start a trial?
BOYD has an initial 14 day free trial period to allow users to assess how well it works with their dataset. Prior to initiating the free trial you should ensure you have a CUR report configured that meets the requirements and that you have AWS command line access and an AWS command line interface profile configured to access your CUR report bucket and files. For BOYD CUR collection you will need to know your CUR bucket, the region where the bucket is deployed and your AWS command line profile that has access to the bucket. For utilizing Dig Mode, you will need access to the read permissions in the default AWS “SecurityAudit” role.
Supported Operating Systems
- MacOS supported
- (Windows support in development)
What are the BOYD AWS Cost and Usage Report configuration requirements?
- AWS CUR reporting with:
- Include resource IDs enabled
- Data export delivery options: Parquet format;
- Report Versioning: Overwrite existing data export file;
- Daily reporting is recommended; hourly reporting is supported but will result in much larger datasets and reduce performance.
- Recommended less than 1GB/month total CUR parquet size, but mileage will vary depending on the schema columns collected, cloud services/architectures in use and local system capabilities;
- AWS command line access to your Cost and Usage Report (CUR) S3 bucket; see “What AWS permissions are required?” for details
What AWS permissions are required?
- BOYD requires AWS permissions to Get/List CUR S3 bucket and files. Dig Mode, which can be used to investigate and apply discovered context to your results, requires additional permissions to AWS Get/Describe APIs, AWS Cloudtrail events API, and AWS Config API. The AWS managed “SecurityAuditRole” role should cover the necessary Dig Mode permissions.
- An example AWS IAM policy showing the minimum permissions required for CUR access (does not include “SecurityAudit” role permissions):
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketAcl",
"s3:GetBucketPolicy"
],
"Resource": [
"arn:aws:s3:::<YOUR_CUR_BUCKET>",
]
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectAcl",
"s3:GetObjectAttributes"
],
"Resource": [
"arn:aws:s3:::<YOUR_CUR_BUCKET>/*"
]
}
]
}
What APIs does Dig Mode support?
Dig Mode has been tested against the following services and APIs. Cloudtrail API requests attempt lookups using the ARN and/or resource ID value. (Note: GovCloud regions are not supported in BOYD v1.0.1.)
Dig Mode Supported APIs
Product Code | Config | API |
---|---|---|
AWS Amplify – app | ✓ | get_app |
AWS Certificate Manager – acm | ✓ | describe_certificate |
AWS Certificate Manager – acm_pca | ✓ | describe_certificate_authority |
AWS CloudFormation – stack | describe_stacks | |
Amazon CloudFront – distribution | ✓ | get_distribution |
Amazon CloudFront – function | ✓ | get_function |
AmazonCloudWatch – flow-log | ✓ | describe_flow_logs |
AmazonCloudWatch – instance | ✓ | describe_instances |
AmazonCloudWatch – logs | describe_log_groups | |
CodeBuild – project | batch_get_projects | |
AWS CodePipeline – codepipeline | ✓ | get_pipeline |
Amazon DynamoDB – table | ✓ | describe_table |
Amazon Elastic Compute Cloud – instance | ✓ | describe_instances |
Amazon Elastic Compute Cloud – nat | ✓ | describe_nat_gateways |
Amazon Elastic Compute Cloud – snapshot | describe_snapshots | |
Amazon Elastic Compute Cloud – volume | ✓ | describe_volumes |
Amazon EC2 Container Registry (ECR) – repository | ✓ | describe_repositories |
Amazon Elastic Container Registry Public – public-repository | ✓ | describe_tasks |
Amazon EFS – file-system | describe_file_systems | |
Amazon ES – es-domain | describe_elasticsearch_domain | |
Amazon ElastiCache – cache-cluster | describe_cache_clusters | |
AWS Glue – crawler | get_crawler | |
AWS Glue – database | ✓ | get_database |
AWS Glue – table | ||
AWS Glue – job | ✓ | get_job |
Amazon Inspector – instance | ✓ | list_findings |
Amazon Inspector – lambda | ✓ | list_findings |
Amazon Kinesis – kinesis | describe_stream | |
Amazon Kinesis Firehose – firehose | describe_delivery_stream | |
AWS Key Management Service – key | ✓ | describe_key |
Amazon Neptune – neptune-db | describe_db_instances | |
Amazon SageMaker – notebook-instance | ✓ | describe_notebook_instance |
Amazon Virtual Private Cloud – client-vpn-enpdoint | describe_client_vpn_endpoints | |
Amazon Virtual Private Cloud – eip | ✓ | describe_addresses |
Amazon Virtual Private Cloud – network_interface | ✓ | describe_network_interfaces |
Amazon Virtual Private Cloud – transit-gateway-attachment | describe_transit_gateway_attachments | |
Amazon Virtual Private Cloud – vpc-endpoint | ✓ | describe_vpc_endpoints |
Elastic Load Balancing – loadbalancerv2_app | ✓ | describe_load_balancers |
Elastic Load Balancing – loadbalancerv2_net | ✓ | describe_load_balancers |
Elastic Load Balancing – loadbalancer | ✓ | describe_load_balancers |
AWS Lambda – function | ✓ | get_function |
Amazon QuickSight – quicksight-user | describe_user | |
Amazon RDS – db | describe_db_instances | |
Amazon RDS – cluster | describe_db_clusters | |
Amazon RDS – cluster-snapshot | describe_db_cluster_snapshots | |
Amazon Route 53 – healthcheck | ✓ | get_health_check |
Amazon Route 53 – hostedzone | ✓ | get_hosted_zone |
Amazon Simple Storage Service – bucket | ✓ | |
AWS Secrets Manager – secret | ✓ | describe_secret |
Amazon Simple Notification Service | ✓ | |
AWS WAF – wafv1-webacl | ✓ | |
AWS WAF – wafv1-webacl+rule | ✓ | |
AWS WAF – wafv1-regional-webacl | ✓ | |
AWS WAF – wafv1-regional-webacl+rule | ✓ | |
AWS WAF – wafv2-regional-webacl | ✓ | |
AWS WAF – wafv2-global-webacl | ✓ |