Saturday, August 1, 2015

Deploying Diksha 

 

In introducing diksha the features of Diksha were demonstrated. This article discusses how to deploy diksha. All actions in this section will incur costs. These instructions are provided without any guarantees.


Diksha relies on DynamoDB and SWF from Amazon.


Figure 1



Figure 1 shows the most simple deployment of diksha on one server. diksha-client is the command-line interface through which the different users will interact with diksha. diksha-server is the workhorse of diksha; keeping track of different scheduled jobs. There is no need for diksha-server to be on the same machine as user. Also there can be more than one diksha-server; in which case the load would be automatically distributed between different servers.


There are, currently, two ways of getting diksha.

(a) Download the two jars from the latest snapshot (currently 0.0.1)

(b) Compile from source
             git clone https://github.com/milindparikh/diksha.git
             mvn clean
             mvn install



Setup


You must setup a SWF domain, certain tables in DynamoDB and one entry in one table before you can actually start to use diksha. All of the  interactions with diksha by a user are expected to occur through diksha-client. It is expected that you are familiar with the security model of AWS.


If you are an administrator of an AWS account :


As an administator of an AWS account, it is possible to complete the entire setup and running of diksha through the command line as a single user.

1. Your environment variables must be setup.

export AWS_ACCESS_KEY_ID=YOURKEYID
export AWS_SECRET_ACCESS_KEY=YOURKEY


2. Create domain, dynamodb tables and a config entry in one table

The short way 

       
       java -jar diksha-client-<SNAPSHOT>.jar -adminit

The long way
{   =============== begin long way ================

Diksha Admin

  Step 1 : Decide on a domain name 
             create the domain name using admcd

//                domainName|domainDescription|workflowretentionperiodinday
             -admcd "diksha|dikshadomain|1"

  Step 2: Create the supporting dynamodb tables

             -admcdt "SchedulerWorkflowState,1,1,clientId,S,loopState,G:S:1:1"
             -admcdt "SchedulerUDF,1,1,functionAlias,S,,"
             -admcdt "SchedulerUDJ,1,1,jobName,S,,"
             -admcdt "SchedulerUDE,1,1,executionId,S,,"
             -admcdt "SchedulerConfig,1,1,configId,S,,"

  Step 3: Create  a configuration through ccfg

     //configid|endPoint|domain|socketTimeout|taskList
    //cf1|https://swf.us-east-1.amazonaws.com|diksha|70000|HelloWorldList

   =============== end long way ================  }




If you are NOT an administrator of an AWS account :


You must ask the an IAM administrator create the following
roles  in the AWS account

1. diksha-admin
2. diksha-designer
3. diksha-user
4. diksha-workflow

The admin  is  associated with creating
    (i) the domain in Simple Work Flow
   (ii) the tables in DynamoDB
   (iii) the initial config

The designer is associated with creating
  (i) functionAlias
 (ii) jobs

The user is associated with
   (i) running different jobs
  (ii) seeing status of different jobs
 (iii) sometimes canceling jobs

The workflow is associated with the execution of the different jobs on schedule as requested


As an IAM administrator, the following command is available to generate the security policies


 java -jar diksha-client-<SNAPSHOT>.jar -adminitsps <awsaccountnbr>

 Once the designated policies are associated with the relevant users, the diksha-admin user can use the short or the long way exactly like the AWS administrator. The diksha-admin user has limited priviledges as compared to AWS administrator.

{ "Version":    "2012-10-17",   "Statement": [ { "Effect": "Allow", "Action": "swf:*", "Resource": "arn:aws:swf:*:123456789012:/domain/dikshaDomain" },{ "Effect": "Allow", "Action": "dynamodb:*", "Resource": "arn:aws:dynamodb:*:123456789012:*" } ] }



YOU ARE NOW READY TO USE diksha !
 

USAGE

Diksha Engine. 

 You must run diksha-engine as an AWS Administrator OR as a user who has access to the diksha-workflow policy. You must run the diksha-engine on at least one server; but you can run multiple diksha-engines on multiple-servers. The load will be (evenly?) distributed between different multiple servers

java -jar diksha-engine-0.0.1.jar



 

Diksha Designer 

Function Aliases

 -lcfg cf1 -cf "cool|L|arn:aws:lambda:us-east-1:123456789012:function:echocool"
creates an alias to the function "cool|L|arn:aws:lambda:us-east-1:123456789012:function:echocool" called "cool"

Predefined Jobs
-lcfg cf1  -cj "runcooljobeverymin|cool|contextmin|0 0-59 * * * *|2"


Diksha User

Running jobs 

-lcfg cf1  -cj "runcooljobeverymin"




Security Policies


The security policies are referenced here for details. Of course your account number would be different


Admin
{ "Version":    "2012-10-17",   "Statement": [ { "Effect": "Allow", "Action": "swf:*", "Resource": "arn:aws:swf:*:123456789012:/domain/dikshaDomain" },{ "Effect": "Allow", "Action": "dynamodb:*", "Resource": "arn:aws:dynamodb:*:123456789012:*" } ] }

Designer
{ "Version":    "2012-10-17",   "Statement": { "Effect": "Allow", "Action":  [ "dynamodb:GetItem","dynamodb:BatchGetItem","dynamodb:Query","dynamodb:Scan","dynamodb:PutItem","dynamodb:UpdateItem","dynamodb:DeleteItem","dynamodb:BatchWriteItem" ] , "Resource":  [ "arn:aws:dynamodb:*:123456789012:table/SchedulerUDF","arn:aws:dynamodb:*:123456789012:table/SchedulerUDF/index/*","arn:aws:dynamodb:*:123456789012:table/SchedulerUDJ","arn:aws:dynamodb:*:123456789012:table/SchedulerUDJ/index/*" ]  } }


User
{ "Version":    "2012-10-17",   "Statement": [ { "Effect": "Allow", "Action":  [ "swf:CountOpenWorkflowExecutions","swf:CountClosedWorkflowExecutions","swf:DescribeActivityType","swf:DescribeDomain","swf:DescribeWorkflowExecution","swf:DescribeWorkflowType","swf:GetWorkflowExecutionHistory","swf:ListActivityTypes","swf:ListClosedWorkflowExecutions","swf:ListOpenWorkflowExecutions","swf:RequestCancelWorkflowExecution","swf:SignalWorkflowExecution","swf:StartWorkflowExecution","swf:TerminateWorkflowExecution" ] , "Resource": "arn:aws:swf:*:123456789012:/domain/dikshaDomain" },{ "Effect": "Allow", "Action":  [ "dynamodb:GetItem","dynamodb:BatchGetItem","dynamodb:Query","dynamodb:Scan" ] , "Resource":  [ "arn:aws:dynamodb:*:123456789012:table/SchedulerConfig","arn:aws:dynamodb:*:123456789012:table/SchedulerConfig/index/*","arn:aws:dynamodb:*:123456789012:table/SchedulerUDF","arn:aws:dynamodb:*:123456789012:table/SchedulerUDF/index/*","arn:aws:dynamodb:*:123456789012:table/SchedulerUDJ","arn:aws:dynamodb:*:123456789012:table/SchedulerUDJ/index/*","arn:aws:dynamodb:*:123456789012:table/SchedulerWorkflowState","arn:aws:dynamodb:*:123456789012:table/SchedulerWorkflowState/index/*" ]  },{ "Effect": "Allow", "Action":  [ "dynamodb:GetItem","dynamodb:BatchGetItem","dynamodb:Query","dynamodb:Scan","dynamodb:PutItem","dynamodb:UpdateItem","dynamodb:DeleteItem","dynamodb:BatchWriteItem" ] , "Resource":  [ "arn:aws:dynamodb:*:123456789012:table/SchedulerUDE","arn:aws:dynamodb:*:123456789012:table/SchedulerUDE/index/*" ]  } ] }


Workflow
{ "Version":    "2012-10-17",   "Statement": [ { "Effect": "Allow", "Action": "swf:*", "Resource": "arn:aws:swf:*:123456789012:/domain/dikshaDomain" },{ "Effect": "Allow", "Action":  [ "dynamodb:GetItem","dynamodb:BatchGetItem","dynamodb:Query","dynamodb:Scan" ] , "Resource":  [ "arn:aws:dynamodb:*:123456789012:table/SchedulerConfig","arn:aws:dynamodb:*:123456789012:table/SchedulerConfig/index/*" ]  },{ "Effect": "Allow", "Action":  [ "dynamodb:GetItem","dynamodb:BatchGetItem","dynamodb:Query","dynamodb:Scan","dynamodb:PutItem","dynamodb:UpdateItem","dynamodb:DeleteItem","dynamodb:BatchWriteItem" ] , "Resource":  [ "arn:aws:dynamodb:*:123456789012:table/SchedulerUDE","arn:aws:dynamodb:*:123456789012:table/SchedulerUDE/index/*","arn:aws:dynamodb:*:123456789012:table/SchedulerWorkflowState","arn:aws:dynamodb:*:123456789012:table/SchedulerWorkflowState/index/*" ]  },{ "Effect": "Allow", "Action": "lambda:InvokeFunction", "Resource": "arn:aws:lambda:*:123456789012:*:*" } ] }






Saturday, July 25, 2015

Introducing diksha -- An AWS Lambda Function Scheduler


diksha is a scalable scheduler that can be used to schedule AWS Lambda functions. It is available here.

The 30 second pitch

1. diksha schedules in a cron-like manner.
2. diksha enables the end user to specify, optionally, the number of times for  and  start and end time of such executions
3. diksha plays nicely with the security model of AWS
4. diksha scales on the cloud.
5. diksha is command line driven
6. diksha is open sourced under the friendly Apache 2.0  License.
7. diksha just requires java 7 and a couple of jars...everything else in on the cloud

Sounds interesting? Read on...

The two minute tour


diksha has two components: a server side and a client side. The client side is the command line driven.

A quintessential command line command is for scheduling a a function (Lambda) to be executed.

java -jar diksha-client-<SNAPSHOT>.jar  -cf "cool|L|arn:aws:lambda:us-east-1:123456789012:function:echocool"
creates an alias to the function "cool|L|arn:aws:lambda:us-east-1:123456789012:function:echocool" called "cool"


java -jar diksha-client-<SNAPSHOT>.jar -ef
"cool|somecontext|0 */1 * * * *|10"

implies

execute lambda ("L") function echocool (arn:aws:lambda:us-east-1:123456789012:function:echocool) passing context ("somecontext") every minute (cron expression: 0 */1 * * * *) for 10 times

More expressive

"cool|somecontext|0 */1 * * * *|10|25.07.2015T14:32:00-0700|25.07.2015T14:35:00-0700



Function Alias    lambda function alias to execute
Context                 context to be passed to lambda function
CronExpression    Defines the periodicity of when the function should be executed
RepeatTimes        How many times does this function need be called
StartTime             When this function should be started to execute
EndTime               When should this function automatically be stopped to execute


As this example shows, there is a conflict between the RepeatTimes and the combination  of StartTime and EndTime because of the CronExpression. The CronExpression is saying "run this every min". The RepeatTimes is saying do this 10 times. Therefore, under normal circumstances, this function needs to run for at least 10 mins (in practice a little longer).  However the difference between startTime and endTime is 3 mins. No enough time for the functon to run for the full 10 cycles. diskha will terminate the function at the end of endTime. Please note the exactness of the startTime and endTime corresponding to the SimpleDateFormat dd.MM.yyyy'T'HH:mm:ssZ.

More Details 

The execution of the command line gives an executionId associated with the schedule such as "81cad398-feb0-4e74-95dd-40101ea33ca7".

The ability to see into the execution of the schedule is through --list-status-execution <executionId> or  -lse <executionId>

clientID : 0a2a5de3-599f-4dd8-b4b9-303176e36d09
     Launch Parameters
           Function: (arn:aws:lambda:us-east-1:123456789012:function:echocool) with context = somecontext
                 CronExpression  : 0 */2 * * * *
                 RepeatTimes     : 10
                 StartTimeDate   : null
                 EndTimeDate     : null
      Current State
            status of loop       : FINISH
            # of times executed  : 10
            Last Executed @      : Thu Jul 30 06:01:59 PDT 2015
            Next Proposed Time @ : Thu Jul 30 06:04:00 PDT 2015

                  ActivityTaskCompleted       Thu Jul 30 05:44:01 PDT 2015    
                  ActivityTaskCompleted       Thu Jul 30 05:46:00 PDT 2015    
                  ActivityTaskCompleted       Thu Jul 30 05:48:02 PDT 2015    
                  ActivityTaskCompleted       Thu Jul 30 05:50:03 PDT 2015    
                  ActivityTaskCompleted       Thu Jul 30 05:52:00 PDT 2015    
                  ActivityTaskCompleted       Thu Jul 30 05:54:01 PDT 2015    
                  ActivityTaskCompleted       Thu Jul 30 05:56:01 PDT 2015    
                  ActivityTaskCompleted       Thu Jul 30 05:58:02 PDT 2015    
                  ActivityTaskCompleted       Thu Jul 30 06:00:01 PDT 2015    
                  ActivityTaskCompleted       Thu Jul 30 06:02:00 PDT 2015    




The ability to cancel a current execution is through --cancel-execution <executionId>|reason or -cane <executionId>|reason

The ability to cancel is done on a best effort from diksha and is NOT immediate. More details later on actual functioning of diksha.


Some Sustainability Tips

While previous two sections were essentially a whirlwind tour of diksha, this section goes more into how-to-sustainably-use diksha.

creating a function alias is done through  --create-function or -cf

-cf "cool|L|arn:aws:lambda:us-east-1:123456789012:function:echocool"  creates a friendly function alias called cool pointing to the actual lambda function.

-cf "supercool|L|arn:aws:lambda:us-east-1:123456789012:function:echocool" creates another function alias called supercool

creating a predefined job is done through --create-job or -cj

 -cj "runcooljobeverymin|cool|contextmin|0 0-59 * * * *|2"

creates a job named  runcooljobeverymin running the function alias cool with context of contextmin every min for 2 times

running a predefined job is done though --execute-job or -ej

-ej "runcooljobeverymin"

A little bit of setting upfront saves a lot of downstream effort. There is no particular current validation and it is possible to overwrite both the function and the job with completely something different.


User Scalability


diksha is designed to handle execution of several executions; each one of which may be run completely independent of the other in context of parameters. For example, you can have several executions running every hour, certain executions every weeks and others running monthly/quarterly.

-laes or --list-active-jobs lists all the active jobs on the scheduler

     CronExpression Loop Count   Next Scheduled Time                                     ExecutionId
       0 */3 * * * *       2     2015-08-01T03:18:00.000Z     1b7bba8a-8181-481d-86e0-aa6a35ab0da7
         0 0 * * * *       1     2015-08-01T04:00:00.000Z     287a2fe5-66a9-4acd-bbe1-57b6c143248a
       0 */1 * * * *       3     2015-08-01T03:16:00.000Z     74f0ff3b-c8f1-452f-8cac-fe0146af2e2e
       0 */5 * * * *       2     2015-08-01T03:20:00.000Z     ad49eed1-db56-42d4-986d-bb95500243d8
       0 */2 * * * *       2     2015-08-01T03:16:00.000Z     bfc3beb2-1b09-41ab-8191-eacdaa5d7e2c



After some time

      CronExpression Loop Count   Next Scheduled Time                                     ExecutionId
       0 */3 * * * *       4     2015-08-01T03:21:00.000Z     1b7bba8a-8181-481d-86e0-aa6a35ab0da7
         0 0 * * * *       1     2015-08-01T04:00:00.000Z     287a2fe5-66a9-4acd-bbe1-57b6c143248a
       0 */1 * * * *       6     2015-08-01T03:19:00.000Z     74f0ff3b-c8f1-452f-8cac-fe0146af2e2e
       0 */5 * * * *       2     2015-08-01T03:20:00.000Z     ad49eed1-db56-42d4-986d-bb95500243d8
       0 */2 * * * *       4     2015-08-01T03:20:00.000Z     bfc3beb2-1b09-41ab-8191-eacdaa5d7e2c




















Saturday, June 20, 2015

AWS Lambda -- Spooky Action at a distance

I am a big fan of AWS Lambda; a cloud managed function based micro-service by Amazon.

The first in a series of articles is about how to enable indexing of objects landing in S3 without any other action from the publisher (except of course landing this file in s3).

The code of this article is at is https://github.com/milindparikh/storiesofaws/tree/master/spooky-action-at-a-distance


THE UBER USER STORY


As a analyst, I want to search files landing in S3 by it's content as soon as the file lands in S3.


AN ARCHITECTURAL PROTOTYPE


Figure 1

The analyst has access to CloudSearch. The publisher has access to S3 bucket. S3 itself can notify certain objects on a file getting inserted into the bucket.  But no way to connect the two. The architectural use case to to illustrate the use of Lambda for the uber user story.Lambda is NOT appropriate in all use cases and needs additional components.

AWS Lambda to the rescue.

Figure 2


As shown in Figure 2, an AWS Lambda function can serve as the essential glue between the storage and search.

It gives the illusion of having search results automatically reflect the puts in the S3 bucket.


THE GORY DETAILS

 

Security


    An AWS Lambda function can only function in context of role given to it. Also only specific roles can invoke a Lambda function.... etc.... all of this has be configured either in the console or SDK or.. you might just want to take a peek at code.


The Core Function


Both the core function and the role generation is template driven in the code. This means that you can change the core function to suit the needs of your particular search config.



My CloudSearch Domain




Figure 3
Figure 3 shows my cloud search domain config. It indices two fields (name and desc) and ignores (notice the * field) the rest.