Wanted to share my DevTeach talk slides on Elastic Search. Where I went into introducing the Elastic Stack. Consisting of Elastic Search, Logstash and Kibana. I also went into the constraints that we had and the design approaches that we took.

Hope you enjoy and expect more ElasticSearch blogs this year 🙂

Transcript

1. STORE 2 MILLION OF AUDIT LOGS A DAY INTO ELASTICSEARCH Taswar Bhatti (Microsoft MVP) GEMALTO @taswarbhatti http://taswar.zeytinsoft.co m taswar@gmail.com
2. WHO AM I? – 4 years Microsoft MVP – 17 years in software industry – Currently working as System Architect in Enterprise Security Space (Gemalto) – You may not have heard of Gemalto but 1/3 of the world population uses Gemalto they just dont know it – Gemalto has stacks build in many environnent .NET, Java, Node, Lua, Python, mobile (Android, IOS), ebanking etc 9/22/2017 2
3. AGENDA – Problem we had and wanted to solve with Elastic Stack – Intro to Elastic Stack (Ecosystem) – Logstash – Kibana – Beats – Elastic Search flows designs that we have considered – Future plans of using Elastic Search 9/22/2017 3
4. QUESTION & POLL – How many of you are using Elastic or some other logging solution? – How do you normally log? Where do you log? – Do you log in Relational Database? 9/22/2017 4
5. HOW DO YOU TROUBLESHOOT OR FIND YOUR BUGS – Typically in a distributed environment one has to go through the logs to find out where the issue is – Could be multiple systems that you have to go through which machine/server generated the log or monitoring multiple logs – Even monitor firewall logs to find traffic routing through which data center – Chuck Norris never troubleshoot; the trouble kills themselves when they see him coming 9/22/2017 5
6. 9/22/2017 6
7. OUR PROBLEM – We had distributed systems (microservices) that would generate many different types of logs, in different data centers – We also had authentication audit logs that had to be secure and stored for 1 year – We generate around 2 millions records of audit logs a day, 4TB with replications – We need to generate reports out of our data for customers – We were still using Monolith Solution in some core parts of the application – Growing pains of a successful application – We want to use a centralized scalable logging system for all our9/22/2017 7
8. FINDING BUGS THROUGH LOGS 9/22/2017 8
9. A LITTLE HISTORY OF ELASTICSEARCH – Shay Banon created Compass in 2004 – Released Elastic Search 1.0 in 2010 – ElasticSearch the company was formed in 2012 – Shay wife is still waiting for her receipe app 9/22/2017 9
10. 9/22/2017 10
11. ELASTIC STACK 9/22/2017 11
12. ELASTICSEARCH – Written in Java backed by Lucene – Schema free, REST & JSON based document store – Search Engine – Distributed, Horizontally Scalable – No database storage, storage is Lucene – Apache 2.0 License 9/22/2017 12
13. COMPANIES USING ELASTIC STACK 9/22/2017 13
14. ELASTICSEARCH INDICES – Elastic organizes document in indices – Lucene writes and maintains the index files – ElasticSearch writes and maintains metadata on top of Lucene – Example: field mappings, index settings and other cluster metadata 9/22/2017 14
15. DATABASE VS ELASTIC 9/22/2017 15
16. ELASTIC CONCEPTS – Cluster : A cluster is a collection of one or more nodes (servers) – Node : A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities – Index : An index is a collection of documents that have somewhat similar characteristics. (e.g Product, Customer, etc) – Type : Within an index, you can define one or more types. A type is a logical category/partition of your index. – Document : A document is a basic unit of information that can be indexed – Shard/Replica: Index divided into multiple pieces called shards, replicas are copy of your shards9/22/2017 16
17. ELASTIC NODES – Master Node : which controls the cluster – Data Node : Data nodes hold data and perform data related operations such as CRUD, search, and aggregations. – Ingest Node : Ingest nodes are able to apply an ingest pipeline to a document in order to transform and enrich the document before indexing – Coordinating Node : only route requests, handle the search reduce phase, and distribute bulk indexing. 9/22/2017 17
18. SAMPLE JSON DOCUMENT HTTP CALL JSON DOCUMENT 9/22/2017 18
19. ELASTICSEARCH CLUSTER 9/22/2017 19
20. TYPICAL CLUSTER SHARD & REPLICA 9/22/2017 20
21. SHARD SEARCH AND INDEX 9/22/2017 21
22. DEMO OF ELASTICSEARCH 9/22/2017 22
23. LOGSTASH – Ruby application runs under JRuby on the JVM – Collects, parse, enrich data – Horizontally scalable – Apache 2.0 License – Large amount of public plugins written by Community https://github.com/logstash- plugins 9/22/2017 23
24. TYPICAL USAGE OF LOGSTASH 9/22/2017 24
25. 9/22/2017 25
26. LOGSTASH INPUT 9/22/2017 26
27. LOGSTASH FILTER 9/22/2017 27
28. LOGSTASH OUTPUT 9/22/2017 28
29. DEMO LOGSTASH 9/22/2017 29
30. BEATS 9/22/2017 30
31. BEATS – Lightweight shippers written in Golang (Non JVM shops can use them) – They follow unix philosophy; do one specific thing, and do it well – Filebeat : Logfile (think of it tail –f on steroids) – Metricbeat : CPU, Memory (like top), redis, mongodb usage – Packetbeat : Wireshark uses libpcap, monitoring packet http etc – Winlogbeat : Windows event logs to elastic – Dockbeat : Monitoring docker – Large community lots of other beats offered as opensource 9/22/2017 31
32. 9/22/2017 32
33. FILEBEAT 9/22/2017 33
34. X-PACK – Elastic commercial offering (This is one of the ways they make money) – X-Pack is an Elastic Stack extension that bundles – Security (https to elastic, password to access Kibana) – Alerting – Monitoring – Reporting – Graph capabilities – Machine Learning 9/22/2017 34
35. 9/22/2017 35
36. KIBANA – Visual Application for Elastic Search (JS, Angular, D3) – Powerful frontend for dashboard for visualizing index information from elastic search – Historical data to form charts, graphs etc – Realtime search for index information 9/22/2017 36
37. 9/22/2017 37
38. DEMO KIBANA 9/22/2017 38
39. DESIGNS WE WENT THROUGH – We started with simple design to measure throughput – One instance of logstash and one instance of ElasticSearch with filebeat 9/22/2017 39
40. DOTNET CORE APP – We used a dotnetcore application to generate logs – Serilog to generate into json format and stored on file – Filebeat was installed on the linux machine to ship the logs to logstash 9/22/2017 40
41. PERFORMANCE ELASTIC – 250 logs item per second for 30 minutes 9/22/2017 41
42. OVERVIEW 9/22/2017 42
43. LOGSTASH 9/22/2017 43
44. ELASTIC SEARCH RUN TWO – 1000 logs per second, run for 30 minutes 9/22/2017 44
45. PERFORMANCE 9/22/2017 45
46. OTHER DESIGNS 9/22/2017 46
47. WHAT WE ARE GOING WITH FOR NOW, UNTIL….. 9/22/2017 47
48. CONSIDERATIONS OF DATA – Index by day make sense in some cases – In other you may want to index by size rather (Black Friday more traffic than other days) when Shards are not balance ElasticSearch doesn’t like that – Don’t index everything, if you are not going to search on specific fields mark them as text 9/22/2017 48
49. FUTURE CONSIDERATIONS – Investigate into Elastic Search Machine learning – ElasticSearch with Kafka for cross data center replication 9/22/2017 49
50. THANK YOU & OPEN TO QUESTIONS – Questions??? – Contact: Taswar@gmail.com – Blog: http://Taswar.zeytinsoft.com – Twitter: @taswarbhatti – LinkedIn (find me and add me)