--- title: "Getting Started" author: "Dyfan Jones" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started} %\VignetteEngine{knitr::rmarkdown} %\usepackage[UTF-8]{inputenc} --- The `noctua` package aims to make it easier to work with data stored in [`AWS Athena`](https://aws.amazon.com/athena/). `noctua` package attempts to provide three levels of interacting with AWS Athena: * Low - level API: This provides more finer tuning of `AWS Athena` backend utilising the AWS SDK [`paws`](https://github.com/paws-r/paws). This includes configuring [`AWS Athena Work Groups`](https://aws.amazon.com/about-aws/whats-new/2019/02/athena_workgroups/) to assuming different roles within `AWS` when connecting to `AWS Athena`. * [DBI interface](https://dbi.r-dbi.org/): This is the primary goal of `noctua`, by providing a `DBI` interface to `AWS Athena`. Users are able to interact with `AWS Athena` utilising familiar functions and methods they have used for other Databases from R. * [dplyr interface](https://dbplyr.tidyverse.org/): As `dplyr` is coming more popular, `noctua` aims to give `dplyr` a seamless interface into `AWS Athena`. # Installing `noctua`: As `noctua` utilising the R AWS SDK `paws` the installation of `noctua` is pretty straight forward: ```r # cran version install.packages("noctua") # Dev version remotes::install_github("dyfanjones/noctua") ``` ## Docker Example: To help with users wishing to run `noctua` in a [docker](https://hub.docker.com/), a simple docker file has been created [here](https://github.com/DyfanJones/noctua/blob/master/docker/Dockerfile). To set up the docker please refer to [link](https://repost.aws/knowledge-center/codebuild-temporary-credentials-docker). For demo purposes we will use the [example docker](https://github.com/DyfanJones/noctua/blob/master/docker/Dockerfile) and run it locally: ```console # build docker image docker build . -t noctua # start container with aws credentials passed from local docker run \ -e AWS_ACCESS_KEY_ID="$(aws configure get aws_access_key_id)" \ -e AWS_SECRET_ACCESS_KEY="$(aws configure get aws_secret_access_key)" \ -e AWS_SESSION_TOKEN="$(aws configure get aws_session_token)" \ -e AWS_DEFAULT_REGION="$(aws configure get region)" \ -it noctua ``` **NOTE:** `readr` isn't required for `noctua`, however it has been included in the docker file to improve performance when querying AWS Athena. # Usage: ## Low - Level API: ```r library(DBI) library(noctua) con <- dbConnect(athena()) # list all current work groups in AWS Athena list_work_groups(con) # Create a new work group create_work_group(con, "demo_work_group", description = "This is a demo work group", tags = tag_options(key= "demo_work_group", value = "demo_01")) ``` ## DBI: ```r library(DBI) con <- dbConnect(noctua::athena()) # Get metadata dbGetInfo(con) # $profile_name # [1] "default" # # $s3_staging # [1] ######## NOTE: Please don't share your S3 bucket to the public # # $dbms.name # [1] "default" # # $work_group # [1] "primary" # # $poll_interval # NULL # # $encryption_option # NULL # # $kms_key # NULL # # $expiration # NULL # # $region_name # [1] "eu-west-1" # # $paws # [1] "0.1.6" # # $noctua # [1] "1.5.1" # create table to AWS Athena dbWriteTable(con, "iris", iris) dbGetQuery(con, "select * from iris limit 10") # Info: (Data scanned: 860 Bytes) # sepal_length sepal_width petal_length petal_width species # 1: 5.1 3.5 1.4 0.2 setosa # 2: 4.9 3.0 1.4 0.2 setosa # 3: 4.7 3.2 1.3 0.2 setosa # 4: 4.6 3.1 1.5 0.2 setosa # 5: 5.0 3.6 1.4 0.2 setosa # 6: 5.4 3.9 1.7 0.4 setosa # 7: 4.6 3.4 1.4 0.3 setosa # 8: 5.0 3.4 1.5 0.2 setosa # 9: 4.4 2.9 1.4 0.2 setosa # 10: 4.9 3.1 1.5 0.1 setosa ``` ## dplyr: ```r library(dplyr) athena_iris <- tbl(con, "iris") athena_iris %>% select(species, sepal_length, sepal_width) %>% head(10) %>% collect() # Info: (Data scanned: 860 Bytes) # # A tibble: 10 x 3 # species sepal_length sepal_width # # 1 setosa 5.1 3.5 # 2 setosa 4.9 3 # 3 setosa 4.7 3.2 # 4 setosa 4.6 3.1 # 5 setosa 5 3.6 # 6 setosa 5.4 3.9 # 7 setosa 4.6 3.4 # 8 setosa 5 3.4 # 9 setosa 4.4 2.9 # 10 setosa 4.9 3.1 ``` # Useful Links: * [SQL](https://docs.aws.amazon.com/athena/latest/ug/functions-operators-reference-section.html) * [AWS Athena performance tips](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/) * [AWS Athena User Guide](https://docs.aws.amazon.com/athena/latest/ug/athena-ug.pdf)