In September I was invited to speak at the Ottawa Elastic Search Meetup Group at Kinaxis. It was great seeing new faces at the meetup, I spoke on Elastic Search Introduction so that new user group members can go through the entire stack of ElasticSearch, logstash and kibana. Hope to speak there some other time soon.

Here are my slides for anyone interested.

Transcript

1. A Gentle Intro to ElasticSearch Taswar Bhatti System/Solutions Architect (Ottawa) GEMALTO
2. Who amI?  System/Solution Architect at Gemalto Ottawa (Microsoft MVP)  I am somewhat of a language geek; I speak a few languages  Kind of like Neo (I KNOW KUNG FU) for languages 2 – Merhaba – नमस्ते – 你好 – ‫ہیلو‬ – Comment ca va? – ਸਤ ਸਰੀ ਅਕਾਲ
3. 9/14/2018 3 Reuters Top 100: Gemalto rated top Global Tech Leaders https://www.thomsonreuters.com/en/products-services/technology/top-100.html
4. Agenda  Problem we had and wanted to solve with Elastic Stack  Intro to Elastic Stack (Ecosystem)  Logstash  Kibana  Beats  Elastic Search flows designs that we have considered  Future plans of using Elastic Search 4
5. How doyouTroubleshootorfindyourbugs?  Typically in a distributed environment one has to go through the logs to find out where the issue is  Could be multiple systems that you have to go through which machine/server generated the log or monitoring multiple logs  Even monitor firewall logs to find traffic routing through which data center  Chuck Norris never troubleshoot; the trouble kills themselves when they see him coming 9/14/2018 5
6. 9/14/2018 6
7. OurProblem  We had distributed systems (microservices) that would generate many different types of logs, in different data centers  We also had authentication audit logs that had to be secure and stored for 1 year  We generate around 2 millions records of audit logs a day, 4TB with replications  We need to generate reports out of our data for customers  We were still using Monolith Solution in some core parts of the application  Growing pains of a successful application  We want to use a centralized scalable logging system for all our logs 9/14/2018 7
8. Findingbugsthroughlogs 9/14/2018 8
9. Alittlehistoryof ElasticSearch  Shay Banon created Compass in 2004  Released Elastic Search 1.0 in 2010  ElasticSearch the company was formed in 2012  Shay wife is still waiting for her receipe app 9/14/2018 9
10. 9/14/2018 10
11. ElasticStack 9/14/2018 11
12. ElasticSearch  Written in Java backed by Lucene  Schema free, REST & JSON based document store  Search Engine  Distributed, Horizontally Scalable  No database storage, storage is Lucene  Apache 2.0 License 9/14/2018 12
13. CompaniesusingElasticStack 9/14/2018 13
14. ElasticSearchindices  Elastic organizes document in indices  Lucene writes and maintains the index files  ElasticSearch writes and maintains metadata on top of Lucene  Example: field mappings, index settings and other cluster metadata 9/14/2018 14
15. Databasevs ElasticSearch 9/14/2018 15
16. ElasticConcepts  Cluster : A cluster is a collection of one or more nodes (servers)  Node : A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities  Index : An index is a collection of documents that have somewhat similar characteristics. (e.g Product, Customer, etc)  Type : Within an index, you can define one or more types. A type is a logical category/partition of your index.  Document : A document is a basic unit of information that can be indexed  Shard/Replica: Index divided into multiple pieces called shards, replicas are copy of your shards 9/14/2018 16
17. Elasticnodes  Master Node : which controls the cluster  Data Node : Data nodes hold data and perform data related operations such as CRUD, search, and aggregations.  Ingest Node : Ingest nodes are able to apply an ingest pipeline to a document in order to transform and enrich the document before indexing  Coordinating Node : only route requests, handle the search reduce phase, and distribute bulk indexing. 9/14/2018 17
18. 9/14/2018 18
19. ElasticsearchCLUSTER 9/14/2018 19
20. TYPICALCLUSTERSHARD&REPLICA 9/14/2018 20
21. Shardsearchandindex 9/14/2018 21
22. DemoofElasticSearch 9/14/2018 22
23. LOGSTASH  Ruby application runs under JRuby on the JVM  Collects, parse, enrich data  Horizontally scalable  Apache 2.0 License  Large amount of public plugins written by Community  https://github.com/logstash-plugins 9/14/2018 23
24. Typicalusageof Logstash 9/14/2018 24
25. 9/14/2018 25
26. Logstashinput 9/14/2018 26
27. Logstashfilter 9/14/2018 27
28. Logstashoutput 9/14/2018 28
29. DEMOLogstash 9/14/2018 29
30. Beats 9/14/2018 30
31. Beats  Lightweight shippers written in Golang (Non JVM shops can use them)  They follow unix philosophy; do one specific thing, and do it well  Filebeat : Logfile (think of it tail –f on steroids)  Metricbeat : CPU, Memory (like top), redis, mongodb usage  Packetbeat : Wireshark uses libpcap, monitoring packet http etc  Winlogbeat : Windows event logs to elastic  Dockbeat : Monitoring docker  Large community lots of other beats offered as opensource 9/14/2018 31
32. 9/14/2018 32
33. FILEBEAT 9/14/2018 33
34. X-Pack  Elastic commercial offering (This is one of the ways they make money)  X-Pack is an Elastic Stack extension that bundles  Security (https to elastic, password to access Kibana)  Alerting  Monitoring  Reporting  Graph capabilities  Machine Learning 9/14/2018 34
35. 9/14/2018 35
36. Kibana  Visual Application for Elastic Search (JS, Angular, D3)  Powerful frontend for dashboard for visualizing index information from elastic search  Historical data to form charts, graphs etc  Realtime search for index information 9/14/2018 36
37. 9/14/2018 37
38. DEMOKIBANA 9/14/2018 38
39. Designswewentthrough  We started with simple design to measure throughput  One instance of logstash and one instance of ElasticSearch with filebeat 9/14/2018 39
40. DotnetCoreapp  We used a dotnetcore application to generate logs  Serilog to generate into json format and stored on file  Filebeat was installed on the linux machine to ship the logs to logstash 9/14/2018 40
41. Performanceelastic  250 logs item per second for 30 minutes 9/14/2018 41
42. overview 9/14/2018 42
43. logstash 9/14/2018 43
44. Elasticsearchruntwo  1000 logs per second, run for 30 minutes 9/14/2018 44
45. performance 9/14/2018 45
46. Otherdesigns 9/14/2018 46
47. Otherdesignsusingredis 9/14/2018 47
48. Usingfilebeat 9/14/2018 48
49. Filebeatwithoutrelay 9/14/2018 49
50. Log4j 9/14/2018 50
51. Log4jdirect 9/14/2018 51
52. Whatwearegoingwithfornow,until….. 9/14/2018 52
53. Considerationsofdata  Index by day make sense in some cases  In other you may want to index by size rather (Black Friday more traffic than other days) when Shards are not balance ElasticSearch doesn’t like that  Don’t index everything, if you are not going to search on specific fields mark them as text 9/14/2018 53
54. FutureConsiderations  Investigate into Elastic Search Machine learning  ElasticSearch with Kafka for cross data center replication  Logstash Centralizex Pipeline for SEIM intergations 9/14/2018 54
55. Thankyou& Opento questions  – Questions???  – Contact: Taswar.bhatti@gemalto.com  – LinkedIn (find me and add me) 9/14/2018 55